









Turn Your 
Designing 
Maze Into 
Amazing 
Designs. 




S E &/workbench™ 

System-Level Design Simulation and Modeling Software 
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Now there is an easy to use, elegant 
tool that guides you through the maze 
of complex systems design and helps 
you evaluate design alternatives 
to discover the consequences of 
decisions before you commit to them. 
SES /workbench. 

SES /workbench is the premier 
multi-level design environment from 
the established leaders in system 
simulation, modeling and design 
evaluation, Scientific and Engineering 
Software, Inc. It can be used to 
evaluate design decisions throughout 
the development cycle, from the 
earliest conceptual stages, through 
system specification and design, to 
functional and perform¬ 
ance verification. 


SES /workbench advances 
the state of the art in 
system-level design and 
evaluation software. 


Its unique graphical interface, 
SES/design, allows you to create a 
pictorial representation of a system’s 
structure and to specify its behavior 
at a high level. 

• You can quickly analyze design 
alternatives and tradeoffs with a few 
clicks of a mouse button. 


Representation of 
the system evolves natu¬ 
rally from a high-level 
behavioral description 
to a low-level structural 
description. 

The graphical 
representation of the 
system is translated 
into SES /sim, an object- 
based superset of C and C++ 
that offers many extensions 
developed specifically for 
simulation modeling. 

Comprehensive statistical 
reports provide a precise 
description of the system’s 
behavior, so that you can evaluate 
system performance. 

Free trial licenses are available. 
Call or write to us for additional 
information. 


SES /workbench. 

A testbed for the imagination. 


Scientific and Engineering Software, Inc. 

1301 West 25th Street, Suite #300 
Austin, Texas, U.S.A. 

Phone: 512/474-4526 Fax: 512/479-6217 



Using SES/design you 
can easily construct 
graphical representatii 

of highly complex systems. 
SES/design graphs are 
automatically translated 
into SES Isim, the powerful 
simulation language at the 
heart of SESlwovkhem h. 
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Breakthrough in presentation of simulation results 
SIMSCRIPT II.5 with SIMGRAPHICS 
Now you see an animated picture of the system 




Local Area Network-NETWORK II.5® Factory in operation-SIMFA CTOR YII.5® 



Free trial and training 

See for yourself how simulation 
results are now easier to understand. 

The free trial contains everything 
you need to try SIMGRAPHICS™ 
on your computer. 

We send you SIMSCRIPT II.5, 
animated models, and complete 
documentation. You can build your 
own model or modify one of ours. 

Try the SIMSCRIPT II.5® lan¬ 
guage, the timeliness of our support, 
the accuracy of our documentation, 
and the facilities for error checking- 
everything you need for a successful 
project. 

For 28 years CACI has provided 
trial use of its simulation software- 
no cost, no obligation. 

Act now for free training 

For a limited time we will also in¬ 
clude free training. 

For immediate information 

Call Doug Dittrich at (619) 
457-9681. In Europe, call Nigel 
McNamara on (081) 332-0122. In 
Canada, call (613) 747-7467. 



Transportation system 


SIMGRAPHICS advanced user interface 
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A Symposium on High Performance Chips 

Santa Clara University, Santa Clara, California 
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9:00-10:30 
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Bruce Lightner, Metaflow Technologies, San Diego, CA 
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The SPEC and Perfect Club Benchmarks: Promises and Limitations 

Rafael Saavedra-Barrera, University of California, Berkeley 
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John Doerr is a general partner with Kleiner Perkins Caufield & Byers, and serves on the board of directors of SUN Microsystems 
and Cypress Semiconductors. Mr. Doerr holds Bachelor and Master degrees from Rice University in Electrical Engineering and 
Computer Science, and an MBA from Harvard. 
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LETTERS 


Ethical relativism 


To the Editor: 

After reading Michael C. McFarland’s 
article “Urgency of Ethical Standards In¬ 
tensifies in Computer Community” in the 
March 1990 Computer (Standards Dept., 
pp. 77-81), I must write in defense of ethi¬ 
cal relativism. 

McFarland has ethical relativism con¬ 
fused with ethical nihilism. Although an 
ethical relativist acknowledges that val¬ 
ues are a product of biology, cultural en¬ 
vironment, and life experiences, the val¬ 
ues are still real. And nothing stops those 
holding them from either acting on those 
values or seeking to impose them on oth¬ 
ers. Relativist beliefs may have a psycho¬ 
logically inhibiting effect on behavior, in 
that one may be less inclined to impose 
subjective values on others than values 
believed to be objective, but this is not a 
logical consequence of relativism. 

Thus, ethical relativism does not, as 
McFarland says, make it “impossible to 
ever critique another person’s behavior.” 
It only makes it impossible to critique be¬ 
havior on the basis of absolute values. An 
ethical relativist can say, “Slavery is 
wrong, and slavery is wrong because God 
said so,” or “Slavery has always been ob¬ 
jectively wrong, but it was only recently 
that humanity discovered this ethical 
fact.” 

In particular, ethical relativism is com¬ 
pletely consistent with a professional so¬ 
ciety’s establishing ethical standards. 

We can all get together and say what we 
think is right (if we can agree) without 
necessarily taking a position on whether 
our opinions are based on eternal values 
or are just our opinions. 

I agree with McFarland’s conclusions 
that ethics are important and the IEEE or 
IEEE Computer Society should get more 
involved in ethical issues. 

Edward Newman 

Santa Clara, California 


Author’s Reply: 

I think Edward Newman and I agree on 
the definition of ethical relativism. It is 
on its implications that we differ. Cer¬ 
tainly, as Newman says, someone who 
holds a relativist position can have a 
strong set of ethical values and can live 
them out with admirable dedication and 
courage. One can also seek out like- 
minded people to share one’s commit¬ 
ments. 

There are, however, deeper issues that 
relativism does not address, to my satis¬ 
faction at least. The most important, from 
a practical point of view, is what to do in 
the face of this uncertainty and disagree¬ 
ment on ethical issues. If rational argu¬ 
ment is to have any value, one must posit 
something intelligible toward which the 
argument reaches. This does not have to 
be an overarching authority, personal or 
impersonal, or a set of fixed rules “wired 
into” the system. But if there is no com¬ 
mon basis for ethical reasoning, however 
imperfectly grasped, I don’t see how you 
can avoid the conclusion that a white su¬ 
premacist’s sincerely held position, for 
example, is just as valid as anyone else’s. 
The alternative is to assert that it is best 
for each individual to follow his or her 
ethical instincts or enlightened self- 
interest, and that will result in the best 
overall ethical situation. But then one 
must account for why these individual 
paths can be expected to converge on 
what is best, and for that matter how we 
can even talk about what is best. 

When we get to the level of ultimate 
verities, we touch deeply held personal 
commitments, where differences are not 
easily discussed, let alone resolved. For¬ 
tunately, as Newman points out, this need 
not discourage us as a society from be¬ 
coming engaged in the important ethical 
issues that confront our profession. In 
this, I am grateful for his support. 

Michael C. McFarland 

AT&T Bell Laboratories 

Murray Hill, New Jersey 


Classification arbitrary 

To the Editor: 

The classification used in “A Hierarchi¬ 
cal Taxonomic System for Computer Archi¬ 
tectures” ( Computer , Mar. 1990, p. 64) 
seems to be rather arbitrary. Three of the 
seven “atoms” are devoted to memory ac¬ 
cess methods (cache, simple memory, in¬ 
terleaved memory) that could arguably be 
called low-level implementation details. 

On the other hand, truly significant prop¬ 
erties of computer architectures like to¬ 
pologies and communication methods are 
missing entirely from the proposed taxon¬ 
omy. For example, in Dasgupta’s scheme, 
the Illiac IV and the Connection Machine 
share the same data path description 
(sM.sX); this ignores the fundamental dif¬ 
ferences between array and hypercube to¬ 
pologies. 

And how does Dasgupta distinguish be¬ 
tween systolic and wavefront arrays? Both 
array architectures would share the same 
formula in his scheme, while in reality their 
different communication methods (clock- 
driven vs. data-driven) put them in very dif¬ 
ferent leagues. 

In short, I find Dasgupta’s taxonomy of 
weak discriminatory power. 

Ulrich Schmidt 

ITT Intermetall, Freiburg, FRG 
Author’s Reply: 

A taxonomic scheme presents a particu¬ 
lar viewpoint. To this extent all such 
schemes have an element of arbitrariness. 
However, I find it strange that Ulrich 
Schmidt regards cache memories, parallel 
memories, etc., a “arguably low-level im¬ 
plementation details.” After all, a large part 
of architectural thinking, past and present, 
is devoted to precisely such issues. 

As to Schmidt’s other comments, I am at 
a loss as to how I can respond to the charge 
of ignoring such airy concepts as “truly sig¬ 
nificant properties” or “fundamental dif¬ 
ferences.” But let me try. 

The overall purpose of a taxonomic 
scheme is to provide a unified conceptual 
framework by means of which we can clas¬ 
sify objects in a succinct and elegant man¬ 
ner. The scheme may not be able to cover all 
aspects of the objects of interest. We may, 
then, decide to extend it arbitrarily — in 
which case it will inevitably become dread¬ 
fully cumbersome. Or, we may leave it in¬ 
tact and construct independent schemes for 
these other aspects. In the case of architec¬ 
ture, my proposal would constitute one 
such scheme, while another may deal with 
communication methods, a third with in¬ 
struction set aspects, and so on. I would 
definitely prefer this latter approach. 

Which one would Schmidt choose? 

Subrata Dasgupta 

University of Southwestern Louisiana 
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EFORE OSF/MOTIF AND TELEUSE,THERE WAS A WORD 
FOR DEVELOPING GRAPHICAL USER INTERFACES. 


Building user interfaces for window-based applications on high-powered workstations is 
pure hell for programmers. Since the user interface generally comprises about 60 percent 
of the total code for an interactive application, it’s also a very time-consuming and costly 
process, a Now, however, there’s TeleUSE, m the industry’s most advanced user interface 
management system (UIMS). This integrated set of interactive tools allows programmers 
to rapidly prototype, design, implement, evaluate and debug standardized graphical user 
interfaces for OSF/Motif,''independently of the application code. It’s a dramatic productivity 




"We believe the time is right for 
OSF/Motif to become a powerful 

AND ACCEPTED STANDARD IN THE 
MARKETPLACE. WE ARE EXCITED THAT 
TeleSoft HAS CHOSEN TO BASE 
THEIR TELEUSE GRAPHICAL USER 
INTERFACE BUILDER ON OSF/MOTIF, 
AND LOOK FORWARD TO THEIR SUCCESS 

IN THE MARKETPLACE." 


breakthrough for anyone developing OSF/Motif applications, a With the TeleUSE graphical 
layout editor and unique dialogue manager, developers can focus more on the design, esthetics 
and viability of the user interface rather than writing code. And cut the development time by 
50 to 90 percent, a For complete details on TeleUSE, call TeleSoft"at (619) 457-2700, 
FAX (619) 452-1334, or write TeleSoft, 5959 Cornerstone Court W., San Diego, CA 92121. 
And let us show you a heavenly way to do windows. 


Kathryn Birkbeck, 

OSF/Motif Technology Manager 
Open Software Foundation 



Using a WYSIWYG approach, software 
developers use the TeleUSE layout 
editor to actually paint the static screen 
layout on a graphics workstation using 
building block interface objects such 
as buttons, scroll bars, pop-up menus 
and dialogue boxes. 


□FT 


PROGRAMMED FOR PRODUCTIVITY 
























The joy of C-scape 


he C-scape™ Interface 
Management System frees C 
programmers from the tedium of 
coding windows, menus, data 
validation, help, and text editing 
functions. 

Moreover, C-scape is a joy to use. With 
C-scape’s object-oriented design, you’ll 
build more functional, more flexible, 
more portable, and more unique 
applications—and you'll have more fun 
doing it. 

The industry standout. Many 

thousands of programmers have quit 
home-grown libraries and cumbersome, 
inflexible products for the pleasure of 
C-scape. The press agrees: “C-scape is by 
far the best... a joy to use,” wrote IEEE 
Computer. PC Magazine chose C-scape 
to produce its Laboratory Benchmark 

Series 5.0 software because 
C-scape offers mouse 
support. Moreover, C-scape 
simultaneously combines 
text and graphics. And because C-scape 
makes it easy to create your own custom 
routines, mqjor companies have selected 
C-scape as a standard for software 
development. 

C-scape is built around an open 
architecture, so you can use it with data 
base management or other C libraries. 


C-scape Features 

Graphics. Combine high-resolution color graphics 
with text or menus. 


Mouse. Use any standard mouse for fast screen 
control. 

Portability. Write hardware independent code. 
Supports DOS, OS/2, UNIX, others. Autodetects 
Hercules, CGA, EGA, VGA. 

Text editing. Create a full-featured text editor or 
pop-up note pad. 

Field flexibility. Create masked, protected and 
marked fields with complete data validation. Use 
time, date, money, pop-up list, and many more 
functions, or create your own. 

Windows. Choose from pop-up, tiled, bordered 
and exploding windows, with size and numbers 
limited only by RAM. 

Menus. Choose from pop-up, pull-down, 123-style, 
or slug menus, or create your own. 

Context-sensitive help. Link help messages to 
individual screens or fields. Cross reference 
messages to create hypertext-like help. 

Screen design. Build any type of screen or form 
with the Look and Feel™ Screen Designer, then 
automatically convert it to C. 

Screen flexibility. Call screens from files at 
run time or link them in. 


And to port from MS-DOS or OS/2 to 
UNIX, just recompile. 

Trial with a smile, c-scape is not 
only the most sophisticated, flexible and 
powerful interface system available, it’s 
also the most friendly—and easiest to 
use. Try C-scape on a 30-day trial. It 
comes with a thorough manual, demo 

disk, sample programs with 
source code, an optional 
screen designer and code 
generator, access to a 
24-hour bulletin board, and toll-free 
support. No royalties, runtime licenses, or 
runtime modules. After you register, you 
get complete library source code at no 
extra cost. 

Call 800-233-3733 (617-491-7311 

in Mass.) to try C-scape now. After the 
joy of C-scape, programming wil never be 
the same. 

MS-DOS, OS/2: $399, library only; with 
Look & Feel,$499. UNIX, XENIX, Apollo, 
Sun, Stratus, others please call. 
Mastercard and Visa accepted. 





Object-oriented. Add features and create 
reusable code modules. 



Oakland Group, Inc. 675 Massachusetts Ave., Cambridge, MA 02139 USA. FAX: 617-868-4440; Washington 206-746-8767; Benelux (02159)46814; Denmark 
(02)88 72 49; France (1)46 09 28 28; Germany/Austria/Switzerland (49)07127/5244; Norway (02)44 88 55; Sweden (013)124780; U.K. (0992)500919. C-scape and 
Look & Feel are trademarks of Oakland Group, Inc. MS-DOS and XENIX are trademarks of Microsoft Corp. OS/2 is a trademark of International Business 
Machines Corp. UNIX is a trademark of AT&T. HERCULES is a trademark of Hercules Computer Technology, Inc. Prices and terms subject to change. 

Photo by Jessica A. Boyatt. Kar\ji by Kaji Aso. 
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1 GUEST EDITORS’ INTRODUCTION 



Cache Architectures in 
Tightly Coupled 
Multiprocessors 

Michel Dubois, University of Southern California 
Shreekant Thakkar, Sequent Computer Systems 


M ultiprocessing is a popular way 
to increase system computing 
power beyond the limit of cur¬ 
rent uniprocessor technology. In a multi¬ 
processor, multiple instruction streams 
execute in parallel and communicate or 
synchronize by passing messages or shar¬ 
ing memory. 

We can classify multiprocessor systems 
according to the logical and physical archi¬ 
tecture of their memory systems. In sys¬ 
tems with disjoint memories — also called 
distributed systems or multicomputers — 
processors can access only their own 
memory, and they communicate by send¬ 
ing messages to each other. In systems 
with shared memory — also called shared- 
memory systems or tightly coupled sys¬ 
tems — processors exchange data and 
synchronize through a global address 
space, accessible by all processors. 

June 1990 


Physically, the shared memory of a 
tightly coupled multiprocessor can be dis¬ 
tributed among processors; these architec¬ 
tures are commonly referred to as nonuni- 
form-memory-access (NUMA) architec¬ 
tures because the access time to shared 
memory that is local to a processor is very 
different from the access time to remote 
sections of shared memory. NUMA ma¬ 
chines are difficult to program because 
their performance is sensitive to the alloca¬ 
tion of shared data structures to physical 
memory modules. In uniform-memory- 
access (UMA) machines, all global mem¬ 
ory is accessed through a common inter¬ 
connection, so access time to any shared- 
memory location is uniform across proces¬ 
sors. 

Efficient tightly coupled multiprocessor 
systems are difficult to design because 
accesses to shared memory can drastically 

0018-9162/90/0600-0009J01.00 © 1990 IEEE 


reduce each processor’s speed. There are 
two aspects to the shared-memory access 
problem. The first, and possibly most 
important, is access latency — that Is, the 
delay between the issuance of a memory 
access by a processor and its completion. 
The second problem is contention among 
accesses from different processors. 

Cache is a common approach to solving 
both problems. Caches have proven effec¬ 
tive in uniprocessors, 1 but they are even 
more effective in multiprocessors because 
they alleviate the contention problem. 
High-speed caches connected to each pro¬ 
cessor maintain local copies of memory 
locations and supply operands and instruc¬ 
tions at the rate required by each processor. 

The existence of multiple copies of data 
gives rise to the coherence problem among 
private caches. Multiple copies of the same 
memory word must be kept consistent in 
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different caches in the presence of process 
migration from processor to processor and 
of sharing of writable data. Clearly, a 
store to a data word present in different 
caches must, in general, propagate to 
those caches in the form of invalidations 
or updates. 

The presence of caches in multiproces¬ 
sors may affect the machine’s logical 
model at the instruction level. If the hard¬ 
ware enforces sequential consistency, 2 
then cache management is totally trans¬ 
parent to the software. Many times, how¬ 
ever, coherence is only enforced at ex¬ 
plicit synchronization points. Systems not 
supporting, or only partially supporting, 
coherence among caches require compiler 
assistance. 

Research topics. Under current tech¬ 
nology trends, the gap between processor 
and memory speeds is increasing. Be¬ 
cause of this and because of the relative 
ease of programming UMA machines, we 
believe caches will be a lasting and neces¬ 
sary feature of high-performance multi¬ 
processors, especially for general-pur¬ 
pose computing. The following issues 
related to multiprocessor caches are ac¬ 
tive research topics. 

New, scalable architectures support¬ 
ing private caches. Other than single- or 
multiple-bus systems, no architecture has 
been built that supports coherence 
through hardware, although proposals 
abound. The problem is compounded by 
the complexity of obtaining credible per¬ 
formance data. Proposals under study 
include architectures with multistage 
interconnection networks, multilevel hi¬ 
erarchical-cache architectures, and arrays 
of processors connected by buses. 

The first article in this special issue. Per 
Stenstrom’s “A Survey of Cache-Coher¬ 
ence Schemes for Multiprocessors,” re¬ 
views these proposals as well as cache 
protocols. The final article, “New Direc¬ 
tions in Scalable, Shared-Memory Multi¬ 
processor Architectures,” is devoted to 
more detailed descriptions of three recent 
proposals. 

Hardware/compiler trade-offs to sup¬ 
port efficient cache coherence. Three so¬ 
lutions are possible. First, all data are 
cached and coherence is maintained by 
the hardware. While this solution maxi¬ 
mizes the hit ratio on all data, it may be 
very complex to implement in large-scale 
systems. 

Second, shared writable data are not 


cached. Following the.programmer’s di¬ 
rectives, the compiler allocates shared 
writable data to noncacheable regions of 
memory. The obvious drawback of such a 
drastic scheme is that large data structures 
cannot be cached, although most of the 
time it would be safe to do so. 

Third, compromise solutions are based 
on the caching of shared writable data 
when it is safe to do so and on selective 
cache flushing when coherence problems 
occur. These solutions require hardware 
support but appear more scalable than fully 
hardware-supported cache coherence. One 
of the articles in this special issue, “Com¬ 
piler-Directed Cache Management in 
Multiprocessors” by Hoichi Cheong and 
Alexander Veidenbaum, introduces such a 
scheme. 

Performance evaluation of cache-based 
multiprocessors. Measurements, trace- 
driven simulations, and analytical models 
are techniques for determining the benefits 
of caches in multiprocessors. In “Direc¬ 
tory-Based Cache Coherence in Large- 
Scale Multiprocessors,” David Chaiken, 
Craig Fields, Kiyoshi Kurihara, and Anant 
Agarwal use these techniques to compare 
the performance of various directory- 
based coherence schemes over a range of 
parallel applications. 

Reliable and efficient interprocessor 
synchronization. In a cache-based system, 
synchronization protocols based on busy¬ 
waiting can cause intense coherence activ¬ 
ity among caches (also referred to as “ping- 
ponging”). When a processor spins on a 
lock, the reading of the lock should be done 
on nonexclusive (nonunique) copies. Only 
when the processor acquires the lock, 
should it obtain an exclusive copy. Even 
then, the coherence traffic will be intense if 
multiple processors are spinning when the 
lock is released by another processor. 

The performance of hardware and soft¬ 
ware synchronization protocols is ana¬ 
lyzed in the article by Gary Graunke and 
Shreekant Thakkar, “An Analysis of Syn¬ 
chronization Algorithms for Shared- 
Memory Multiprocessors.” 

Efficient implementation of virtual 
memory through caching. In a virtual- 
memory environment, the benefit of 
caches is lost if address translations are not 
themselves cached so that they can be 
retrieved at cache speed. Address-trans¬ 
lation caches are also called translation- 
lookaside buffers (TLBs). Besides the 
address translation, a TLB entry contains 


status information about the page it points 
to. Because the content of TLB entries can 
be modified, coherence problems exist 
among TLBs. These problems are ad¬ 
dressed by Patricia Teller in her article, 
“Translation-Lookaside Buffer Consis¬ 
tency.” 

Ordering of shared-memory accesses to 
obtain simple and efficient concurrency 
models. In nonbus systems, where proces¬ 
sors are very fast, memory accesses must 
be buffered, and the interconnection may 
not enforce atomic updates/invalidations 
of all copies. The traditional notion of 
coherence is not applicable any more. A 
concurrency model refers to the way the 
hardware is “seen” at the instruction-set 
level and is based on the overall ordering of 
shared-memory accesses. The classical 
concurrency model is sequential consis¬ 
tency. Another model, weak ordering, has 
been defined. It basically restricts correct 
ordering of accesses to explicit synchroni¬ 
zation points. 2 - 3 

Design of caches and their associated 
protocol to increase processor efficiency. 
Uniprocessors are becoming very power¬ 
ful. However, even with high hit ratios, 
processor efficiency can be reduced by the 
blocking time, or penalty, that results from 
a miss on shared-memory accesses and 
from coherence activity. 

Research has focused on optimizing the 
cache protocol to minimize its overhead 
and on designing nonblocking caches (also 
called lockup-free caches) that do not 
block the processor on a miss, but allow the 
processor to access the cache even if sev¬ 
eral misses are pending. 


T he five articles and three short 
reports in this special issue are 
the result of careful selection 
and refereeing of 19 submissions. We hope 
that you find them both informative and 
useful. ■ 
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A Survey of Cache 
Coherence Schemes for 
Multiprocessors 


Per Stenstrom 
Lund University 


S hared-memory multiprocessors 
have emerged as an especially cost- 
effective way to provide increased 
computing power and speed, mainly be¬ 
cause they use low-cost microprocessors 
economically interconnected with shared 
memory modules. 

Figure 1 shows a shared-memory multi¬ 
processor consisting of processors con¬ 
nected with the shared memory modules 
by an interconnection network. This sys¬ 
tem organization has three problems 1 : 

(1) Memory contention. Since a mem¬ 
ory module can handle only one memory 
request at a time, several requests from 
different processors will be serialized. 

(2) Communication contention. Con¬ 
tention for individual links in the intercon¬ 
nection network can result even if requests 
are directed to different memory modules. 

(3) Latency time. Multiprocessors with 
a large number of processors tend to have 
complex interconnection networks. The la¬ 
tency time for such networks (that is, the 
time a memory request takes to traverse the 
network) is long. 

These problems all contribute to increased 
memory access times and hence slow down 
the processors’ execution speeds. 

Cache memories have served as an im¬ 
portant way to reduce the average memory 
access time in uniprocessors. The locality 
of memory references over time ( temporal 
locality) and space ( spatial locality) al- 
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Cache coherence 
schemes tackle the 
problem of 
maintaining data 
consistency in 
shared-memory 
multiprocessors. 

They rely on 
software, hardware, 
or a combination 
of both. 


lows the cache to perform a vast majority 
of all memory requests (typically more 
than 95 percent); memory handles only a 
small fraction. It is therefore not surprising 
that multiprocessor architects also have 
employed cache techniques to address the 
problems pointed out above. Figure 2 
shows a multiprocessor organization with 
caches attached to all processors. This 
cache organization is often called private, 
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as opposed to shared, because each cache 
is private to one or a few of the total 
number of processors. 

The private cache organization appears 
in a number of multiprocessors, including 
Encore Computer’s Multimax, Sequent 
Computer Systems’ Symmetry, and Digi¬ 
tal Equipment’s Firefly multiprocessor 
workstation. These systems use a common 
bus as the interconnection network. Com¬ 
munication contention therefore becomes 
a primary concern, and the cache serves 
mainly to reduce bus contention. 

Other systems worth mentioning are 
RP3 from IBM, Cedar from the University 
of Illinois at Urbana-Champaign, and 
Butterfly from BBN Laboratories. These 
systems contain about 100 processors 
connected to the memory modules by a 
multistage interconnection network with a 
considerable latency. RP3 and Cedar also 
use caches to reduce the average memory 
access time. 

Shared-memory multiprocessors have 
an advantage: the simplicity of sharing 
code and data structures among the pro¬ 
cesses comprising the parallel application. 
Process communication, for instance, can 
be implemented by exchanging informa¬ 
tion through shared variables. This sharing 
can result in several copies of a shared 
block in one or more caches at the same 
time. To maintain a coherent view of the 
memory, these copies must be consistent. 
This is the cache coherence problem or the 
cache consistency problem. A large num- 
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ber of solutions to this problem have been 
proposed. 

This article surveys schemes for cache 
coherence. These schemes exhibit various 
degrees of hardware complexity, ranging 
from protocols that maintain coherence in 
hardware to software policies that prevent 
the existence of copies of shared, writable 
data. First we’ll look at some examples of 
how shared data is used. These examples 
help point out a number of performance 
issues. Then we’ll look at hardware proto¬ 
cols. We’ll see that consistency can be 
maintained efficiently, but in some cases 
with considerable hardware complexity, 
especially for multiprocessors with many 
processors. We’ll investigate software 
schemes as an alternative capable of reduc¬ 
ing the hardware cost. 

Example of algorithms 
with data sharing 

Cache coherence poses a problem 
mainly for shared, read-write data struc¬ 
tures. Read-only data structures (such as 
shared code) can be safely replicated with¬ 
out cache coherence enforcement mecha¬ 
nisms. Private, read-write data structures 
might impose a cache coherence problem 
if we allow processes to migrate from one 
processor to another. Many commercial 
multiprocessors help increase throughput 
for multiuser operating systems where user 
processes execute independently with no 
(or little) data sharing. In this case, we need 
to efficiently maintain consistency for pri¬ 
vate, read-write data in the context of proc¬ 
ess migration. 

We will concentrate on the behavior of 
cache coherence schemes for parallel ap¬ 
plications using shared, read-write data 
structures. To understand how the schemes 
work and how they perform for different 
uses of shared data structures, we will 
investigate two parallel applications that 
use shared data structures differently. 
These examples highlight a number of 
performance issues. 

We can find the first example — the 
well-known bounded-buffer producer and 
consumer problem — in any ordinary text 
on operating systems. Figure 3 shows it in 
a Pascal-like notation. The producer in¬ 
serts a data item in the shared buffer if the 
buffer is not full. The buffer can store N 
items. The consumer removes an item if 
the buffer is not empty. We can choose the 
number of producers and consumers arbi¬ 
trarily. The buffer is managed by a shared 
array, which implements the buffer, and 



Figure 1. An example of a shared memory multiprocessor. 1 



Figure 2. An example of a multiprocessor with private caches. 1 


Producer: 

Consumer: 

if count <= N then 

if count <> 0 then 

mutexbegin 

mutexbegin 

buffer[ in ] := item; 

item := buffer[ out ]; 

in := in + 1 mod N; 

out := out + 1 mod N; 

count := count + 1; 

count := count - 1; 

mutexend 

mutexend 


Figure 3. Pascal-like code for the bounded-buffer problem. 


three shared variables: in, out, and count, 
which keep track of the next item and the 
number of items stored in the buffer. 
Semaphores (implemented by mutexbegin 


and mutexend) protect buffer operations, 
which means that one process at most can 
■enter the critical section at a time. 

The second example to consider is a 
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repeat 

par_for J := 1 to N do 
begin 

xtempt J ] := b[ J ]; 
for K := 1 to N do 

xtemp[ J ] := xtempt J ] + A[ J,K ] * x[ K ]; 

end; 

barriersync; 
par for J := 1 to N do 
x[ J ] := xtempt J ]; 
barriersync; 
until false; 


Figure 4. Pascal-like code for one iteration of the parallel algorithm for solving a 
linear system of equations by iteration. 


parallel algorithm for solving a linear sys¬ 
tem of equations by iteration. It takes the 
form 

x. +| = Ax. + b 

where x. +1 , x., and b are vectors of size N 
and A is a matrix of size NxN. Suppose that 
each iteration (the calculation of vector 
x. +1 ) is performed by N processes, where 
each process calculates one vector ele¬ 
ment. The code for this algorithm appears 
in Figure 4. The termination condition does 
not concern us here. Therefore, we assume 
that it never terminates. 

The par_for statement initiates N pro¬ 
cesses. Each process calculates a new 
value, which is stored in xtemp. The last 
parallel loop in the iteration copies back 
the elements of xtemp to vector x. This 
requires a barrier synchronization. The 
most important observations are 

(1) Vector b and matrix A are read- 
shared and can be safely cached. 

(2) All elements of vector x are read to 
calculate a new vector element. 


(3) All elements of vector x are updated 
in each iteration. 

With these examples in mind, we will 
consider how the proposed schemes for 
cache coherence manage copies of the data 
structures. 

Proposed solutions range from hard- 
ware-implemented cache consistency 
protocols, which give software a coherent 
view of the memory system, to schemes 
providing varied hardware support but 
with cache coherence enforcement poli¬ 
cies implemented in software. We will 
focus on the implementation cost and per¬ 
formance issues of the surveyed schemes. 

Hardware-based 

protocols 

Hardware-based protocols include 
snoopy cache protocols, directory 
schemes, and cache-coherent network 
architectures. They all rely on a certain 
cache coherence policy. Let’s start to look 
at different policies. 


Cache coherence policies. Hardware- 
based protocols for maintaining cache 
coherence guarantee memory system co¬ 
herence without software-implemented 
mechanisms. Typically, hardware mecha¬ 
nisms detect inconsistency conditions and 
perform actions according to a hardware- 
implemented protocol. 

Data is decomposed into a number of 
equally sized blocks. A block is the unit of 
transfer between memory and caches. 
Hardware protocols allow an arbitrary 
number of copies of a block to exist at the 
same time. There are two policies for 
maintaining cache consistency: write- 
invalidate and write-update. 

The write-invalidate policy maintains 
consistency of multiple copies in the fol¬ 
lowing way: Read requests are carried out 
locally if a copy of the block exists. When 
a processor updates a block, however, all 
other copies are invalidated. How this is 
done depends on the interconnection net¬ 
work used. (Ignore it for the moment.) A 
subsequent update by the same processor 
can then be performed locally in the cache, 
since copies no longer exist. Figure 5 
shows how this policy works. In Figure 5a, 
four copies of block X are present in the 
system (the memory copy and three cached 
copies). In Figure 5b, processor 1 has 
updated an item in block X (the updated 
block is denoted X') and all other copies are 
invalidated (denoted I). If processor 2 is¬ 
sues a read request to an item in block X’, 
then the cache attached to processor 1 
supplies it. 

The write-update policy maintains con¬ 
sistency differently. Instead of invalidat¬ 
ing all copies, it updates them as shown in 
Figure 5c. Whether the memory copy is 
updated or not depends on how this proto¬ 
col is implemented. We will look at that 
later. 

Consider the write-invalidate policy for 
the bounded-buffer problem, recalling the 
code in Figure 3. Suppose a producer 
process P and a consumer process C, exe¬ 
cuting on different physical processors, 
alternately enter the critical section in the 
following way: P enters the critical section 
K times in a row, then C enters the critical 
section K times in a row, and so forth. If 
K= 1, then count will be read and written by 
P and C, then P again, etc. This means there 
will be a miss on the read, then an invalida¬ 
tion on the write. Referred to as the ping- 
pong effect, this means data migrates back 
and forth between the caches, resulting in 
heavy network traffic. However, if the 
producer process inserts consecutive items 
in the buffer — that is, if Af>l — then the 


Table 1. Comparison of the number of consistency actions generated by the 
cache coherence policies for the example algorithms. 


Communication Cost 

Bounded-Buffer 

Problem 

Iterative 

Algorithm 

Write-invalidate 

Invalidations 

1 

N 


Misses 

1 

N 

Write-update 

Updates 

K 

N 
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reads and writes to count will be local. The 
same holds for the consumer process. 

Now consider the write-update policy 
applied to the bounded-buffer problem. 
Here, note that the write to count generates 
a global update independent of the order of 
execution of P and C. Table 1 shows the 
communication cost associated with ac¬ 
cesses to the variable count for K consecu¬ 
tive executions of the critical section. 
Under the assumption that the communica¬ 
tion cost is the same for an invalidation as 
for an update and that the communication 
cost for a miss is twice that for an invalida¬ 
tion, then the break-even point of the 
communication cost between the two poli¬ 
cies is K= 3. 

Now consider the iterative algorithm of 
Figure 4 and the write-invalidate protocol. 
Suppose the block size is exactly one vec¬ 
tor element and the cache is infinitely 
large. Observe first that accesses to matrix 
A and vector b will be local, since they are 
read-shared and will not be invalidated. 
However, each process will realize a read 
miss on every access to vector x, since all 
elements of x are updated (that is, all cop¬ 
ies are invalidated) in each iteration. Each 
process generates exactly one invalidation. 
Thus, each process will have N read misses 
and N invalidations in each iteration. 

If we instead consider the write-update 
policy, then all reads will be local but N 
global updates will be generated for each 
process. These observations are summa¬ 
rized in Table 1. Write-update performs 
better in terms of communication cost than 
does write-invalidate for this algorithm, 
with the same assumptions as for the 
bounded-buffer problem. 

The write-invalidate and write-update 
policies require that cache invalidation and 
update commands (collectively referred to 
as consistency commands) be sent to at 
least those caches having copies of the 
block. Until now we have not considered 
the implications of this for different net¬ 
works. In some networks (such as buses), it 
is feasible to broadcast consistency com¬ 
mands to all caches. This means that every 
cache must process every consistency 
command to find out whether it refers to 
data in the cache. These protocols are 
called snoopy cache protocols because 
each cache “snoops” on the network for 
every incoming consistency command. 

In other networks (such as multistage 
networks), the network traffic generated 
by broadcasts is prohibitive. Such systems 
prefer to multicast consistency commands 
exactly to those caches having a copy of 
the block. This requires bookkeeping by 



(a) 



(b) 



(c) 


Figure 5. (a) Memory and three processor caches store consistent copies of block 
X. (b) All copies except the one stored in processor l’s cache are invalidated (I) 
when processor 1 updates X (denoted X') if the write-invalidate policy is used, (c) 
All copies (except the memory copy, which is ignored) are updated if the write- 
update policy is used. 
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Figure 6. State-transition graph for states of cached copies for the write-once 
protocol. Solid lines mark processor-initiated actions, and dashed lines mark 
consistency actions initiated by other caches. 


means of a directory that tracks all copies 
of blocks. Hence, these protocols are called 
directory schemes. 

First we’ll look at different implementa¬ 
tions of snoopy cache protocols. Then 
we’ll look at directory schemes. While 
snoopy cache protocols rely on the use of 
buses, directory schemes can be used for 
general interconnection networks. Past 
work has also yielded proposals for cache- 
coherent network architectures supporting 
a large number of processors. 

Write-in validate snoopy cache proto¬ 
cols. Historically, Goodman proposed the 
first write-invalidate snoopy cache proto¬ 
col, called write-once and reviewed by 
Archibald and Baer. 2 To understand the 
hardware complexity of the reviewed 
protocols, and certain possible optimiza¬ 
tions, we will take a rather detailed look at 
this protocol. 

The write-once protocol associates a 
state with each cached copy of a block. 
Possible states for a copy are 

• Invalid. The copy is inconsistent. 

• Valid. There exists a valid copy con¬ 


sistent with the memory copy. 

• Reserved. Data has been written ex¬ 
actly once and the copy is consistent with 
the memory copy, which is the only other 
copy. 

• Dirty. Data has been modified more 
than once and the copy is the only one in 
the system. 

Write-once uses a copy-back memory 
update policy, which means that the entire 
copy of the block must be written back to 
memory when it is replaced, provided that 
it has been modified during its cache resi¬ 
dence time (that is, the state is dirty). To 
maintain consistency, the protocol re¬ 
quires the following consistency com¬ 
mands besides the normal memory read 
block (Read-Blk) and write block (Write- 
Blk) commands: 

• Write-Inv. Invalidates all other copies 
of a block. 

• Read-Inv. Reads a block and invali¬ 
dates all other copies. 

State transitions result either from the 
local processor read and write commands 


(P-Read and P-Write) or the consistency 
commands (Read-Blk, Write-Blk, Write- 
Inv, and Read-Inv) incoming from the 
global bus. Figure 6 shows a state-transi¬ 
tion graph summarizing the actions taken 
by the write-once protocol. Solid lines 
mark processor-initiated actions, and 
dashed lines mark consistency actions ini¬ 
tiated by other caches and sent over the 
bus. 

The operation of the protocol can also be 
specified by making clear the actions taken 
on processor reads and writes. Read hits 
can always be performed locally in the 
cache and do not result in state transitions. 
For read misses, write hits, and write 
misses the actions occur as follows: 

• Read miss. If no dirty copy exists, then 
memory has a consistent copy and supplies 
a copy to the cache. This copy will be in the 
valid state. If a dirty copy exists, then the 
corresponding cache inhibits memory and 
sends a copy to the requesting cache. Both 
copies will change to valid and the memory 
is updated. 

• Write hit. If the copy is in the dirty or 
reserved states, then the write can be car¬ 
ried out locally and the new state is dirty. If 
the state is valid, then a Write-Inv consis¬ 
tency command is broadcast to all caches, 
invalidating their copies. The memory 
copy is updated and the resulting state is 
Reserved. 

• Write miss. The copy either comes 
from a cache with a dirty copy, which then 
updates memory, or from memory. This is 
accomplished by sending a Read-Inv con¬ 
sistency command, which invalidates all 
cached copies. The copy is updated locally 
and the resulting state is dirty. 

• Replacement. If the copy is dirty, then 
it has to be written back to main memory. 
Otherwise, no actions are taken. 

Other examples of proposed write- 
invalidate protocols include the Illinois 
protocol proposed by Papamarcos and 
Patel and the Berkeley protocol specifi¬ 
cally designed for the SPUR (Symbolic 
Processing Using RISC) multiprocessor 
workstation at the University of California 
at Berkeley (reviewed by Archibald and 
Baer 2 ). They improve on the management 
of private data (Illinois) and take into ac¬ 
count the discrepancy between the mem¬ 
ory and cache access times to optimize 
cache-to-cache transfers (Berkeley). 

Write-update snoopy cache protocols. 

An example of a write-update protocol, the 
Firefly protocol, has been implemented in 
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Figure 7. State-transition graph for states of cached copies for the Firefly 
protocol. 


the Firefly multiprocessor workstation 
from Digital Equipment (reviewed by 
Archibald and Baer 2 ). It associates three 
possible states with a cached copy of a 
block: 

• Valid-exclusive. The only cached 
copy, it is consistent with the memory 
copy. 

• Shared. The copy is consistent, and 
there are other consistent copies. 

• Dirty. This is the only copy. The 
memory copy is inconsistent. 

The Firefly protocol uses copy-back 
update policy for private blocks and write- 
through for shared blocks. The notion of 
shared and private is determined at run- 

To maintain consistency, a write-update 
consistency command updates all copies. 
A dedicated bus line, denoted “shared 
line,” is used by the snooping mechanisms 
to tell the writer that copies exist. Figure 7 
summarizes the state transitions. 

The actions taken on a processor read or 
write follow: 

• Read miss. If there are shared copies, 
then these caches supply the block by 
synchronizing the transmission on the bus. 
If a dirty copy exists, then this cache sup¬ 
plies the copy and updates main memory. 
The new state in these cases is shared. If 
there is no cached copy, then memory 
supplies the copy and the new state is 
valid-exclusive. 

• Write hit. If the block is dirty or valid- 
exclusive, then the write can be carried out 
locally and the resulting state is dirty. If the 
copy is shared, all other copies (including 
the memory copy) are updated. If sharing 
has ceased (indicated by the shared line), 
then the next state is valid-exclusive. 

• Write miss. The copy is supplied either 
from other caches or from memory. If it 
comes from memory, then its loaded-in 
state is dirty. Otherwise, all other copies 
(including the memory copy) are updated 
and the resulting state is shared. 

• Replacement. If the state is dirty, then 
the copy is written back to main memory. 
Otherwise, no actions are taken. 

Another write-update protocol, the 
Dragon protocol (reviewed by Archibald 
and Baer 2 ), has been proposed for the 
Dragon multiprocessor workstation from 
Xerox PARC. To improve the efficiency of 
cache-to-cache transfers, it avoids updat¬ 
ing memory until a block is replaced. 


Implementation and performance 
issues for snoopy cache protocols. 

Snoopy cache protocols are extremely 
popular because of the ease of implemen¬ 
tation. Many commercial, bus-based 
multiprocessors have used the protocols 
we have investigated here. 1 For example, 
Sequent Computer Systems’ Symmetry 
multiprocessor and Alliant Computer Sys¬ 
tems ’ Alliant FX use write-invalidate poli¬ 
cies to maintain cache consistency. The 
DEC Firefly uses the write-update policy, 
as does the experimental Dragon worksta¬ 
tion mentioned above. 

The main differences between a snoopy 
cache and a uniprocessor cache are the 
cache controller, the information stored in 
the cache directory, and the bus controller. 
The cache controller is a finite-state ma¬ 
chine that implements the cache coherence 
protocol according to the state transition 
graphs of Figures 6 and 7. 

The cache directory needs to store the 
state for each block. Only two bits are 
needed for the protocols we have reviewed. 
The bus controller implements the bus- 
snooping mechanisms, which must moni¬ 
tor every bus operation to discover whether 


an action is needed. Since the snooping 
mechanism must have access to the direc¬ 
tory, contention for the directory can arise 
between local requests and requests com¬ 
ing in from the bus. For that reason the 
directory is often duplicated. 

Another implementation issue concerns 
the bus design. To efficiently support the 
protocols reviewed here, certain bus lines 
are needed. One example we have seen is 
the shared line to support write-update 
policies. Therefore, dedicated bus stan¬ 
dards have been proposed such as the IEEE 
Futurebus (IEEE standard P896.1). 

Now let’s discuss the impact of certain 
cache parameters on the performance of 
snoopy cache protocols. We would use 
snoopy cache protocols mainly to reduce 
bus traffic, with a secondary goal of reduc¬ 
ing the average memory access time. An 
important question is how these metrics 
are affected by the block (line) size when 
using a write-invalidate protocol. 

For uniprocessor caches, bus traffic and 
average access time mainly result from 
cache misses, that is, references to data 
that are not cache resident. Uniprocessor 
cache studies have demonstrated that the 


June 1990 


17 









Table 2. Bit overhead for proposed di¬ 
rectory schemes. 


Scheme 

Overhead 


(No. of Bits) 

Tang 

CB 

Censier 

M(B+N) 

Stenstrom 

C{B+N) + M log 2 A 


miss ratio decreases when the block size 
increases. This results from the spatial 
locality of code, in particular, and for data. 
The miss ratio decreases until the block 
size reaches a certain point — the data 
pollution point — then it starts to increase. 
For larger caches the data pollution point 
appears at a larger block size. 

Bus traffic per reference (in number of 
bus cycles) is proportional both to the miss 
ratio, M, and the number of words that 
must be transferred to serve a cache miss. 
If this number matches the block size, L, 
then average bus traffic per reference is B 
= M L. Hence, if the miss ratio decreases 
when the block size increases, bus traffic 
will not necessarily decrease. In fact, simu¬ 
lations have shown that bus traffic in¬ 
creases with block size for data refer¬ 
ences, 3 which suggests using a small block 
size in bus-based multiprocessors. 

For write-invalidate protocols, a cache 
miss can result from an invalidation initi¬ 
ated by another processor prior to the cache 
access — an invalidation miss. Such 
misses increase bus traffic. Note that in¬ 
creasing the cache size cannot reduce in¬ 
validation misses. Eggers and Katz 4 have 
done extensive simulations based on paral¬ 
lel program traces (trace-driven simula¬ 
tion) to investigate the impact of block size 
on the miss ratio and bus traffic (see also 
their references to earlier work). One of 
their conclusions is that the total miss ratio 
generally exceeds that in uniprocessors. 
Moreover, it does not necessarily decrease 
when the block size increases, unlike 
uniprocessor cache behavior. This means 
that bus traffic in multiprocessors may 
increase dramatically when the block size 
increases. 

We can explain the main results by using 
the example algorithms from Figures 3 and 
4. Consider the bounded-buffer problem 
and the use of the shared array, buffer. If 
the line size matches the size of each item, 
then the consumer will experience an in¬ 
validation miss on every access, assuming 


that producers and consumers access the 
critical section alternately. Note that if the 
block size increases, the invalidation miss 
ratio remains the same but bus traffic in¬ 
creases. However, with a larger block size, 
consumers could benefit from a decreased 
miss ratio if the same consumer process 
accessed the critical section several times 

For the iterative algorithm from Figure 
4, increasing the block size reduces the 
miss ratio for accesses to vector x, since all 
elements of the block are accessed once. 
Accesses to vector xtemp, however, expe¬ 
rience a higher miss ratio because each 
write to xtemp invalidates all copies. This 
means that, in the worst case, a read miss 
for xtemp results for each iteration in the 
inner loop. 

Even if the spatial locality with respect 
to a process is high, this does not necessar¬ 
ily suggest a large block size. It depends on 
the effect of accesses by all processes 
sharing the block. For shared data usage 
where data are exclusively accessed by one 
process for a considerable amount of time, 
increasing the block size may reduce the 
invalidation miss ratio. 

For write-update protocols, the block 
size is not an issue because misses are not 
caused by consistency-related actions. 
Moreover, the frequency of global updates 
does not depend on the block size. A poten¬ 
tial problem, however, is that write-update 
protocols tend to update copies even if they 
are not actively used. Note that a copy 
remains in the cache until replaced, since 
write-update protocols never invalidate 
copies. This effect is more emphasized for 
large caches, which help multiprocessors 
reduce the miss ratio and the resulting bus 
traffic. 

An important performance issue for 
write-invalidate policies concerns reduc¬ 
ing the number of invalidation misses. For 
write-update policies, an important issue 
concerns reducing the sharing of data to 
lessen bus traffic. Now let’s survey some 
extensions to the two types of protocols 
that address these issues. 

Snoopy cache protocol extensions. 
The write-invalidate protocol may lead to 
heavy bus traffic caused by read misses 
resulting from iterations where one pro¬ 
cessor updates a variable and a number of 
processors read the same variable. This 
happens with the iterative algorithm shown 
in Figure 4. The number of read misses 
could be reduced considerably if, upon a 
read miss, the copy were distributed to all 
caches with invalid copies. In that case, all 


N read misses per iteration and per process 
could be eliminated for all processes less 
one. Such an extension to the read-invali- 
date protocol, called read-broadcast , was 
proposed by Rudolph and Segall. 5 

As noted for the write-update protocol, 
data items might be updated even if never 
accessed by other processors. This could 
happen with the bounded-buffer problem 
of Figure 3 if a consumer process migrates 
from one processor to another. In this case, 
parts of the buffer might remain in the old 
cache until replaced. This can take a very 
long time if the cache is large. While in the 
cache, the buffer generates heavy network 
traffic because of the broadcast updates. 

One approach measures the break-even 
point when the communication cost (in 
terms of bus cycles for updating a block) 
exceeds the cost of handling an invalida¬ 
tion miss. Assuming that a miss costs twice 
as much as a global update, then the break¬ 
even point appears when two consecutive 
updates have taken place with no interven¬ 
ing local accesses. We could implement 
this scheme by adding two cache states that 
determine when the break-even point is 
reached. Karlin et al. 6 proposed and evalu¬ 
ated a number of such extensions, called 
competitive snooping, and Eggers and Katz 
evaluated the performance benefits of 
these extensions. 4 

Directory schemes. We have seen that 
even using large caches cannot entirely 
eliminate bus traffic because of the consis¬ 
tency actions introduced as a result of data 
sharing. This puts an upper limit on the 
number of processors that a bus can ac¬ 
commodate. For multiprocessors with a 
large number of processors — say, 100 — 
we must use other interconnection net¬ 
works, such as multistage networks. 

Snoopy cache protocols do not suit 
general interconnection networks, mainly 
because broadcasting reduces their per¬ 
formance to that of a bus. Instead, consis¬ 
tency commands should be sent to only 
those caches that have a copy of the block. 
To do that requires storing exact informa¬ 
tion about which caches have copies of all 
cached blocks. 

We will survey different approaches 
proposed in the literature. Note that this 
issue can be considered orthogonal to the 
choice of cache coherence policy. There¬ 
fore, keep in mind that either write-invali¬ 
date or write-update would serve. Cache 
coherence protocols that somehow store 
information on where copies of blocks 
reside are called directory schemes. Agar- 
wal et al. surveyed and evaluated directory 
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Figure 8. Actions taken on a read miss (thin lines) and a write hit (bold lines) for 
the write-invalidate implementation of (a) the Censier scheme and (b) the 
Stenstrom scheme. 


schemes in their work. 7 

The proposed schemes differ mainly in 
how the directory maintains the informa¬ 
tion and which information the directory 
stores. Tang proposed the first directory 
scheme (reviewed by Agarwal et al. 7 ). He 
suggested a central directory containing 
duplicates of all cache directories. The 
information stored in the cache directory 
depends on the coherence policy em¬ 
ployed. The main point is that the directory 
controller can find out which caches have 
a copy of a particular block by searching 
through all duplicates. 

Censier and Feautrier proposed another 
organization of the directory (also re¬ 
viewed by Agarwal et al. 7 ). Associated 
with each memory block is a bit vector, 
called the presence flag vector. One bit for 
each cache indicates which caches have a 
copy of the block. Some status bits are 
needed, depending on the cache coherence 
policy used. 

In an earlier work, I proposed a different 
way of storing the directory information. 8 
Instead of associating the state information 
and the presence flag vector with the mem¬ 
ory copy, this information is associated 
with the cached copy. Let’s call this the 
Stenstrom scheme. 

First we compare the implementation 
cost in terms of the number of bits needed 
to store the information. Given M memory 
blocks, C cache lines, N caches, and B bits 
for state information for each cache block, 
Table 2 shows the overhead for each 
scheme. 

From Table 2 we see that the Tang 
scheme has the least overhead, provided 
that C<M. However, this scheme has two 
major disadvantages. First, the directory is 
centralized, which can introduce severe 
contention. Second, the directory control¬ 
ler must search through all duplicates to 
find which caches have copies of a block. 

In the other schemes, state information 
is distributed over memory or cache mod¬ 
ules, which reduces contention. Further¬ 
more, for both schemes the presence flag 
vector stores the residency of copies, elimi¬ 
nating the need for the search associated 
with the Tang scheme. This simplification 
does not come for free. In the Censier 
scheme, overhead is proportional to mem¬ 
ory size; in the Stenstrom scheme, it is 
proportional to cache size. The last scheme 
needs the identity of the current owner in 
memory. This requires an additional log 2 /V 
bits. The bit overhead for both schemes is 
prohibitive for multiprocessors with a 
large number of processors because of the 
size of the presence flag vector. 


To get an insight into the reduction of 
network traffic over snoopy cache proto¬ 
cols, assume that the directory organiza¬ 
tions presented above support the write- 
invalidate cache coherence scheme. Since 
the Tang and Censier schemes differ only 
in the directory implementation, we will 
consider only the Censier and Stenstrom 
schemes. 

Let’s concentrate on how read misses 
and write hits are handled. Previous proto¬ 
col descriptions have already shown how 
other actions are handled. In the following, 
assume the system contains exactly one 
dirty copy. Figure 8 shows the control flow 
of consistency actions. 

In the Censier scheme, a read miss at 
cache 2 results in a request sent to the 
memory module. The memory controller 


retransmits the request to the dirty cache. 
This cache writes back its copy. The 
memory module can then supply a copy to 
the requesting cache. These actions appear 
in Figure 8a as thin lines. If a write hit is 
generated at cache 1, then a command is 
sent to the memory controller, which sends 
invalidations to all caches marked in the 
presence flag vector (cache 2) in Figure 8a. 
Bold lines mark these actions in Figure 8a. 

Considering the Stenstrom scheme, a 
read miss at cache 2 results in a request sent 
to the memory module. The memory con¬ 
troller retransmits the request to the dirty 
cache. Instead of writing back its copy, the 
cache supplies the copy directly to cache 2. 
These actions appear in Figure 8b as thin 
lines. If a write hit is generated at cache 1, 
then invalidation commands are sent di- 
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Table 3. Cache pointer bit overhead 
for a full-map, limited, and chained di¬ 
rectory scheme. 


Scheme 

Overhead 


(No. of Bits) 

Stenstrom 

Mlog 2 N + CN 

Limited 

iM\og 2 N 

Chained 

M\og 2 N + Clog 2 N 


rectly to all caches marked in the presence 
flag vector (cache 2) in Figure 8b. Bold 
lines mark these actions in Figure 8b. 

Assuming the bounded-buffer problem 
of Figure 3 and the count variable, some 
important points come up. First, invalida¬ 
tions will be sent to only one cache because 
there will be at most one copy of count. 
This is very important because if broad¬ 
casts were generated, this would result in 
immense network traffic. Second, in both 
cases read misses to dirty blocks must 
traverse the network twice. Third, the 
Censier scheme requires sending a request 
to the memory controller for each invalida¬ 
tion. In the Stenstrom scheme, invalida¬ 
tions can be sent directly because the pres¬ 
ence flag vector is stored in the cache. The 
price for this, however, is that the presence 
flag vector must be fetched from the cur¬ 
rent owner if the block is not owned (the 
processor does not have write permission), 
which results in considerable network traf¬ 
fic for large presence flag vectors. For ap¬ 
plications with one writer to a block, as is 
the case for the iterative algorithm in Fig¬ 
ure 4, this overhead stays small. 

The directory schemes presented have 
the main advantage of being able to restrict 
the consistency commands to those caches 
having copies of a block. They are called 
full-map directory schemes because they 
can track copies of an arbitrary number of 
cache's. However, they are expensive to 
implement, especially for multiprocessors 
containing many processors. 

There are different alternatives to re¬ 
duce the directory size. One method, called 
the limited directory scheme, restricts the 
number of cache pointers to less than the 
actual number of caches. Given N caches 
and i pointers in each directory entry, 
where i < N, then /log 2 N bits are needed to 
track copies of blocks for each memory 
block. A key question for limited directory 
schemes is how to handle cases where 


more than i copies are requested. Two 
alternatives are possible: Either disallow 
more than i copies or start to broadcast 
when more than i copies exist. Clearly the 
success of a limited directory scheme de¬ 
pends on the degree of sharing, that is, the 
number of processors that simultaneously 
share data. 

Agarwal et al. 7 introduced a classifica¬ 
tion of directory schemes. They referred to 
a directory scheme as Dir. X, where i is the 
number of cache pointers for each block 
and X denotes whether the scheme broad¬ 
casts consistency commands (X = B ) when 
the number of copies exceeds the number 
of cache pointers, or whether it disallows 
more than i copies (X = NB). Their termi¬ 
nology denotes the full-map schemes as 
Dir N NB and the limited directory schemes 
with broadcast capability as Dir. B, where 
i < N. 

One possible way of reducing the size of 
the directory for Dir N NB schemes is to link 
in a list all caches that store a copy of a 
block. We could do this by associating an 
entry including log 2 N bits with each cache 
line and memory block. This entry con¬ 
tains a pointer to the next cache that stores 
a copy. This scheme, called a chained di¬ 
rectory scheme, routes consistency com¬ 
mands to only those caches having copies 
of a block. However, when we compare the 
chained directory scheme with the full- 
map directory schemes, we find that multi¬ 
cast operations may take longer to per¬ 
form, thus slowing the processors. The 
Scalable Coherent Interface (IEEE P1596) 9 
proposes a chained directory scheme. 

An example of an extremely cost-effec¬ 
tive directory scheme that relies on broad¬ 
casting consistency commands (denoted 
Dir x B) is the one proposed by Archibald 
and Baer (reviewed by Agarwal et al. 7 ). 
Each directory entry consists of two bits 
encoding four global states of a memory 
block: not present in any cache, clean in 
exactly one cache, clean in an unknown 
number of caches, and dirty in exactly one 
cache. When a processor updates a block, 
an invalidation is broadcast to all caches. 
This generates immense network traffic. 
Nevertheless, this scheme is scalable in the 
sense that the number of caches can in¬ 
crease without changing the directory 
structure. 

Table 3 compares the bit overhead re¬ 
quired for cache pointers for a full-map 
(Stenstrom) scheme, a limited scheme with 
i pointers, and a chained directory scheme. 
Assume M memory blocks, C cache lines, 
and N caches. The chained directory is 
cheaper than the Stenstrom scheme. How¬ 


ever, the Stenstrom scheme sends consis¬ 
tency commands directly to other caches 
without having to traverse the chain of 
cache pointers. 

The full-map directory schemes have 
the advantage of reducing network traffic 
caused by invalidations or updates by 
multicasting them only to those caches 
with copies of a block. However, the 
amount of memory needed tends to be 
prohibitive for multiprocessors with many 
processors. Reducing the number of cache 
pointers, that is, employing limited direc¬ 
tory schemes, alleviates this problem. The 
price for this is limiting the number of 
copies that may simultaneously coexist in 
different caches or introducing peaks of 
network traffic due to broadcasting of 
consistency commands. Consequently, a 
trade-off exists between network traffic 
and directory size. No commercial im¬ 
plementation yet uses directory schemes. 

Another article in this issue, written by 
Chaiken et al., compares the performance 
of various directory schemes through a 
number of benchmark applications. 

Cache-coherent network architec¬ 
tures. The real success of shared-memory 
multiprocessors lies in designs that pro¬ 
vide a large number of processors inter¬ 
connected in an economical way. We have 
seen that a common bus does not suit 
hundreds of processors. Multistage net¬ 
works have problems, too, because of the 
hardware complexity for many processors. 
Therefore, researchers have proposed 
multiprocessors with a hierarchy of buses, 
in which network traffic is reduced by 
hierarchical cache-coherence protocols 
that don’t suffer from the implementation 
complexity of directory schemes. Let’s 
review three novel architectures based on 
this approach. 

The first one, the hierarchical cache/bus 
architecture proposed by Wilson, 10 ap¬ 
pears in Figure 9. We can view this archi¬ 
tecture as a hierarchy of caches/buses 
where a cache contains a copy of all blocks 
cached underneath it. This requires large 
higher level cache modules. Memory 
modules connect to the topmost bus. 

To maintain consistency among copies, 
Wilson proposed an extension to the write- 
invalidate protocol. Consistency among 
copies stored at the same level is main¬ 
tained in the same way as for traditional 
snoopy cache protocols. However, an in¬ 
validation must propagate vertically to 
invalidate copies in all caches. Suppose 
that processor P 1 issues a write (see Figure 
9). The write request propagates up to the 
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highest level and invalidates every copy. 
Consequently, copies in M c 20, M c 22, 
M c 16, and M c 18 will be invalidated. How¬ 
ever, higher order caches such as M c 20 
keep track of dirty blocks beneath them. A 
subsequent read request issued by P ? will 
propagate up the hierarchy because no 
copies exist. When it reaches the topmost 
level, M c 20 issues a flush request down to 
M c l 1 and the dirty copy is supplied to the 
cache of processor P ? . 

Note that higher level caches act as fil¬ 
ters for consistency actions; an invalida¬ 
tion command or a read request will not 
propagate down to subsystems that don’t 
contain a copy of the corresponding block. 
This means that M c 21 in the example above 
acts as a filter for the invalidations on the 
topmost cache, since this subsystem has no 
copies. 

The next architecture is the Wisconsin 
Multicube, proposed by Goodman and 
Woest. 11 As shown in Figure 10, it consists 
of a grid of buses with a processing ele¬ 
ment in each switch and a memory module 
connected to each column bus. A process¬ 
ing element consists of a processor, a 
cache, and a snoopy cache controller con¬ 
nected to the row and column bus. The 
snoopy cache is large (comparable to the 
size of main memory in a uniprocessor) to 
reduce network traffic. The large caches 
mean bus traffic results mainly from con¬ 
sistency actions. 

Like in the hierarchical cache/bus sys¬ 
tem, a write-invalidate protocol maintains 
consistency. Invalidations are broadcast 
on every row bus, while global read re¬ 
quests are routed to the closest cache with 
a copy of the requested block. This is 
supported in the following way; Each 
block has a “home column” corresponding 
to the memory module that stores the 
block. A block can be globally modified or 
unmodified. If the block is globally modi¬ 
fied, then there exists only one copy. Each 
cache controller stores in its column the 
identification of all modified blocks, 
which serves as routing information for 
read requests. A read request is broadcast 
on the row bus and routed to the column 
bus where the modified block is stored. 

The Data Diffusion Machine 12 is an¬ 
other hierarchical cache-coherent archi¬ 
tecture quite similar to Wilson’s architec¬ 
ture. It consists of a hierarchy of buses with 
large processor caches (on the order of a 
megabyte) at the lowest level, which is the 
only type of memory in the system. A 
hierarchical write-invalidate protocol 
maintains consistency. Unlike Wilson’s 
architecture, higher order caches (such as 



Figure 9. Wilson’s hierarchical cache/bus architecture. 



Figure 10. Goodman and Woest’s Wisconsin Multicube. 
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repeat 

par for J := 1 to N do 
begin 

xtemp[ J ] := b[ J ]; 

— cache-read( b[ J ]) 

for K := 1 to N do 
xtemp[ J ] := 
xtemp[ J ] + 

— cache-read(xtemp[ J ]) 

A[ J,K ] * 

— cache-read(A[ J,K ]) 

x{ K ]; 

— memory-read(x[ K ]) 

end; 

barriersync; 

— cache-invalidate 

par_for J := 1 to N do 

x[ J ] :== xtemp[ J ]; 

— memory-read(xtemp[ J ]) 

barriersync; 

— cache-invalidate 

until false; 



Figure 11. Example of reference marking of the iterative algorithm. 


the level-2 caches in Figure 9) contain only 
state information, which considerably 
reduces memory overhead. We pay a price 
for this: Certain read requests must be sent 
to the root and then down to a leaf and back 
again because an intermediate-level cache 
cannot satisfy them. 

Interestingly, the global memory has 
been distributed to the processors. In con¬ 
junction with the cache coherence proto¬ 
col, this allows an arbitrary number of 
copies. Since data items have no home 
locations, as opposed to the Wilson and 
Wisconsin Multicube architectures, they 
will “diffuse” to those memory modules 
where they are needed. The Data Diffusion 
Machine is currently being built at the 
Swedish Institute of Computer Science. 

Compared to the full-map directory 
schemes, these architectures are more cost- 
effective in terms of memory overhead and 
constitute an interesting extension to bus- 
based architectures for large shared-mem¬ 
ory multiprocessors. However, it is too 
early to tell whether implementations will 
prove efficient. 

Software-based 

schemes 

Software cache-coherence schemes at¬ 
tempt to avoid the need for complex hard¬ 
ware mechanisms. Let’s take a look at 
some of the proposals. 

How to prevent inconsistent cache 
copies. Hardware-based protocols effec¬ 
tively reduce network traffic. However, 
we pay for this with complex hardware 


mechanisms, especially for multiproces¬ 
sors with a large number of processors. 

An alternative would prevent the exis¬ 
tence of inconsistent cached data by limit¬ 
ing the caching of a data structure to safe 
times. This makes it necessary to analyze 
the program to mark variables as cacheable 
or noncacheable, which a sophisticated 
compiler or preprocessor can do. The most 
trivial solution would be to mark all shared 
read-write variables as noncacheable. This 
is too conservative, since shared data struc¬ 
tures can be exclusively accessed by one 
process or are read-only during a consider¬ 
able amount of time. During such intervals 
it is safe to cache a data structure. 

A better approach would let the com¬ 
piler analyze when it is safe to cache a 
shared read-write variable. During such 
intervals it marks the variable as cache- 
able. At the end of the interval, main 
memory must be consistent with the 
cached data, and cached data must be made 
inaccessible from the cache by invalida¬ 
tion. This raises the following key ques¬ 
tions: How does the compiler mark a vari¬ 
able as cacheable, and how is data invali¬ 
dated? 

The following survey of software-based 
cache coherence schemes will address 
these issues. Consult Cheong and Veiden- 
baum 13 for references to further reading. 
See also the article written by Cheong and 
Veidenbaum in this issue. 

Cacheability marking. We can base 
the reference marking of a shared variable 
on static partitioning of the program into 
computational units. Accesses to a shared 
variable in one computational unit might 


differ from those of another computational 
unit. For example, the accesses might be 
one of the following types: 

(1) Read-only for an arbitrary number 
of processes. 

(2) Read-only for an arbitrary number 
of processes and read-write for ex¬ 
actly one process. 

(3) Read-write for exactly one process. 

(4) Read-write for an arbitrary number 
of processes. 

Here, we assume that processes execute on 
different processors. Type 1 implies that 
the variable is cacheable, such as all ele¬ 
ments of the shared matrix A and vector b 
in the iterative algorithm of Figure 4. Type 
2 implies that at most the read-write pro¬ 
cess may cache the variable and that main 
memory is always made consistent. Using 
write-through update policy achieves this. 
Type 3 allows the variable to be cached and 
updated using copy-back, as for the shared 
variables in the critical sections of the 
producer and consumer code in Figure 3. 
Finally, for type 4 we must mark the vari¬ 
able as noncacheable. Consider, for ex¬ 
ample, synchronization variables such as 
those implementing the mutexbegin, 
mutexend, and barrier synchronization of 
Figures 3 and 4. 

Because synchronizations often delimit 
a computational unit, we can apply differ¬ 
ent rules for maintaining a variable’s con¬ 
sistency for different computational units. 
Between computational units, cached 
shared variables must be invalidated be¬ 
fore the next computational unit enters. 
Moreover, main memory must be updated, 
either by using write-through update pol¬ 
icy or by flushing the content of the cache 
if a copy-back policy is used. 

Computational units are easily identi¬ 
fied if they are explicit in the program 
code, such as the parallel for-loops in the 
iterative algorithm. In the first parallel 
loop, all elements of xtemp are type 3, 
while all elements of A, b, and x are type 1. 
In the second parallel loop, all elements of 
vectors x and xtemp are type 3, which 
makes it possible to cache all shared vari¬ 
ables provided that all elements of vector x 
are invalidated at the end of the second 
parallel for-loop and main memory is 
consistent at the beginning of the iteration. 
The critical sections associated with the 
bounded-buffer algorithm provide another 
example of a computational unit. 

Typically, the compiler’s main task is to 
analyze data dependencies and generate 
appropriate cache instructions to control 
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the cacheability and invalidation of shared 
variables. The data dependence analysis 
itself, an important and sometimes com¬ 
plex task, lies outside the scope of this 
article. Interested readers should consult 
the references in Cheong and Veiden¬ 
baum. 11 Instead, we will look at different 
approaches to enforcing cache coherence 
and investigate the hardware support im¬ 
plied by these schemes. The first three 
approaches rely on parallel for-loops to 
classify the cacheability of shared vari¬ 
ables. The last approach relies on critical 
sections as a model for accessing shared 
read-write data. 

Cache coherence enforcement 
schemes. In the first approach, proposed 
by Cheong and Veidenbaum, all shared 
variables accessed within a computational 
unit receive equal treatment; either all or 
none can be cached. This scheme assumes 
a write-through cache, which guarantees 
up-to-date memory content. Moreover, it 
assumes three cache instructions: Cache- 
On, Cache-Off, and Cache-Invalidate. 
Cache-On turns caching on for all shared 
variables when all shared accesses are 
read-only (type 1) or exclusively accessed 
by one process (type 3). Cache-Off results 
in all shared accesses bypassing cache and 
going to the shared memory. 

After execution of a computational unit, 
the Cache-Invalidate instruction invali¬ 
dates all cache content. Invalidating the 
whole cache content, called indiscriminate 
invalidation, has the advantage of being 
easy to implement efficiently. However, 
indiscriminate invalidation is too conser¬ 
vative and leads to an unnecessarily high 
cache-miss ratio. For instance, in the itera¬ 
tive algorithm, caching is turned on, allow¬ 
ing all variables to be cached. However, 
invalidations needed after each barrier 
synchronization result in unnecessary 
misses for accesses to the read-only matrix 
A and vector b. 

Selective invalidation of only those 
variables that can introduce inconsistency 
would improve this scheme. It is important 
to implement selective invalidation effi¬ 
ciently. Cheong and Veidenbaum 13 pro¬ 
posed a scheme with these objectives. In 
this scheme, shared-variable accesses 
within a computational unit are classified 
as always up to date or possibly stale. The 
scheme assumes three types of cache in¬ 
structions to support this, namely, Mem¬ 
ory-Read, Cache-Read, and Cache-Invali¬ 
date. Memory-Read means possibly stale 
cached copy, whereas Cache-Read guar¬ 
antees up-to-date cached copy. Further¬ 


Table 4. Comparison of the number of invalidation misses for the software- 
based schemes and a write-invalidate hardware-based scheme for the iterative 
algorithm. 



Indiscriminate 

Invalidation 

Fast Selective 
Invalidation 

Timestamp 

Scheme 

Write-invalidate 
Hardware Scheme 

Loop 1 

N(L+\) + L 

N 

N 

N 

Loop 2 

L 

L 



Sum 

N(L+\) + 2L 

N + L 

N 

N 


more, the scheme assumes the cache uses 
write-through. 

Associated with each cache line is a 
change bit. The Cache-Invalidate instruc¬ 
tion sets all change bits true. If a Memory- 
Read is issued to a cache block with its 
change bit set true, then the read request 
will be passed to memory. When the re¬ 
quested block is loaded into the cache, the 
change bit is set false and subsequent ac¬ 
cesses will hit in the cache. 

To demonstrate this method, consider 
once again the iterative algorithm of Fig¬ 
ure 4. Assume that the block size is one 
vector element and that n = N/L processors 
cooperate in the execution of the parallel 
loops. Each processor executes L itera¬ 
tions. Figure 11 includes comments for all 
read operations to shared data with the 
cache instructions supported by the 
scheme. The only sources of inconsistency 
are the accesses to vectors x and xtemp. 
This means the only references that need 
marking as Memory-Reads are when x is 
read in the first parallel loop and when 
xtemp is read in the second parallel loop. 
Clearly, this scheme eliminates cache 
misses on the accesses to vector b and 
matrix A. Just turning on all change bits 
efficiently accomplishes the fast selective 
invalidation scheme in one cycle. 

Even if this scheme reduces the number 
of invalidation misses, it is still conserva¬ 
tive because the same processor might 
execute the same iterations in the two par¬ 
allel loops. If so, the corresponding ele¬ 
ments of xtemp will be unnecessarily in¬ 
validated and reread from memory in the 
second parallel loop. 

A third scheme takes advantage of this 
temporal locality: the timestamp-based 
scheme proposed by Min and Baer. 14 This 
scheme associates a “clock” (a counter) 
with each data structure, such as vectors x 
and xtemp in the iterative algorithm. This 
clock is updated-at the end of each compu¬ 
tational unit (at the barrier synchroniza¬ 


tions in the algorithm) in which the corre¬ 
sponding variable is updated. For example, 
the clock for vector xtemp is updated after 
the first parallel loop, and the clock for 
vector x is updated after the second loop. A 
timestamp associated with each block in 
the cache (for example, with each vector 
element) is set to the value of the corre¬ 
sponding clock+1 when the block is up¬ 
dated in the cache. A reference to a cache 
word is valid if its timestamp exceeds its 
associated clock value. Otherwise, the 
block must be fetched from memory. 

This scheme eliminates invalidations 
associated with variables local to a proces¬ 
sor between two computational units 
because the timestamp value for these 
variables exceeds their clock value. The 
hardware support for this scheme consists 
of a number of clock registers and a time- 
stamp entry for each cache line in the cache 
directory. 

Let’s compare the number of invalida¬ 
tion misses generated by the schemes pre¬ 
sented so far and compare these numbers 
with the corresponding number for a write- 
invalidate hardware scheme. Assume that 
n = N/L processors execute the iterations 
and that each processor always executes 
the same iteration. This means that 
processor i executes iterations (i-1 )L + 1 to 
iL in the parallel loops in the iterative 
algorithm. Assume a block size corre¬ 
sponding to one vector element. Table 4 
shows the result. 

The table makes it clear that the last 
scheme results in the same number of in¬ 
validation misses as does any write-invali¬ 
date hardware-based scheme. The hard¬ 
ware support for the different schemes 
differs in complexity. The indiscriminate 
invalidation scheme requires only a 
mechanism for turning on or off and invali¬ 
dating the cache. The fast selective invali¬ 
dation scheme requires one bit for each 
cache line (the change bit), whereas the 
timestamp-based scheme requires a time- 
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stamp register with several bits for each 
cache line and a number of clock registers 
(two in this algorithm). 

Compared to the hardware support for a 
write-invalidate protocol, especially for 
multiprocessors with a large number of 
processors, software-based schemes are 
clearly very cost-effective. This compari¬ 
son yields another important observation: 
Software-based schemes are competitive 
with hardware-based schemes, at least for 
this simple example. In another article in 
this issue, Cheong and Veidenbaum pres¬ 
ent another scheme that preserves tempo¬ 
ral locality between computational units. 
They also present a performance compari¬ 
son between many of the schemes re¬ 
viewed in this section. 

To end this section, let’s consider a soft¬ 
ware-based scheme proposed by Smith and 
reviewed elsewhere. 13 It differs from the 
others by relying on the fact that accesses 
to shared variables always take place in 
critical sections, as in the bounded-buffer 
algorithm of Figure 3. 

The scheme maintains consistency by 
selectively invalidating all shared vari¬ 
ables associated with a critical section, as 
follows: All shared variables in a critical 
section are allocated to the same page. A 
one-time identifier (OTI) is associated 
with each page. When a cache block is 
loaded into the cache, the corresponding 
OTI is loaded from the translation-look- 
aside buffer (TLB, an address translation 
mechanism) into the entry in the cache 
directory that corresponds to the cache 
line. For an access to hit in the cache, the 
stored OTI must match the OTI in the TLB. 
Fast invalidation of all shared variables 
associated with a critical section can now 
be done by simply changing the OTI for the 
corresponding page. This scheme’s inter¬ 
esting feature is the fast selective invalida¬ 
tion mechanism. 

Software-based schemes have not yet 
been used in any commercial systems. 
However, many of the ideas presented in 
this section are being tested in the experi¬ 
mental system Cedar at the University of 
Illinois at Urbana-Champaign. 


D espite extensive study of the cache 
coherence problem, many pos¬ 
sible research directions remain. 
First, most of the schemes presented here, 
except for snoopy cache protocols, have 
never been implemented. We can only 
evaluate them in real implementations. 
Second, since the area of parallel process¬ 
ing remains immature, we face a paucity of 


real-life applications. for large multi¬ 
processors. This makes it difficult to evalu¬ 
ate these ideas under realistic assumptions. 
Third, as we have seen, the design space 
for multiprocessor caches is large and in¬ 
volves complicated trade-offs. For ex¬ 
ample, a definite need exists for experi¬ 
mental research to explore performance 
differences between software-based and 
hardware-based schemes. 

The advent of high-speed reduced in¬ 
struction set computer (RISC) micropro¬ 
cessors with an increased memory band¬ 
width requirement will put an increased 
burden on the memory system for future 
multiprocessors. Therefore, multiproces¬ 
sor caches are and will be a hot topic in the 
coming years. ■ 
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Translation-Lookaside 
Buffer Consistency 
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A translation-lookaside buffer is a 
special-purpose, virtual-address 
cache required to implement a 
paged virtual memory efficiently. Shared- 
memory multiprocessors with multiple 
TLBs, also known as translation buffers or 
directory-lookaside tables, give rise to a 
special case of the cache consistency prob¬ 
lem, which can occur when multiple im¬ 
ages of data can reside in multiple distinct 
caches, as well as in main memory. If one 
of these images is modified, then the others 
become inconsistent with the modified 
data image and no longer represent a valid 
image of the data. 

A processor accesses a TLB entry to 
determine the memory location and acces¬ 
sibility of referenced data. TLB entries 
store this information in data structures 
called page tables. Since multiple proces¬ 
sors can read and write page tables, they 
can make corresponding information in 
TLBs stale, which in turn can cause errone¬ 
ous memory accesses and incorrect pro¬ 
gram behavior. 

We can solve this problem, called the 
TLB consistency problem, either by ensur¬ 
ing consistency between information in 
page tables and TLBs or by preventing the 
use of inconsistent TLB entries. In this 
article, I discuss nine solutions proposed in 
the literature. Three of these require vir¬ 
tual-address, general-purpose caches kept 
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Shared-memory 
multiprocessors 
with multiple 
translation-lookaside 
buffers must deal 
with a cache 
consistency problem. 
This article describes 
nine solutions. 


consistent by special-purpose hardware. 
Although I describe the general idea be¬ 
hind these solutions, I concentrate on the 
others, since I am particularly interested in 
identifying solutions for scalable architec¬ 
tures without hardware cache consistency, 
especially multiprocessors with a multi¬ 
stage network interconnecting processors 
and memory. In scalable multiprocessor 
architectures, the number of processors 
and memory modules can increase with the 
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dimensions of the network, so a solution to 
TLB inconsistency should meet the needs 
of tens, hundreds, or thousands of proces- 

One of a multiprocessor’s main goals is 
to increase the execution speed of applica¬ 
tion programs by distributing the computa¬ 
tional load among multiple processors. 
Therefore, it is important that the perform¬ 
ance overhead for a solution to TLB incon¬ 
sistency does not reduce the possible 
speedups in these computing environ¬ 
ments. The overhead includes processor 
execution and idle time attributable to 
maintaining TLB consistency and the 
implicit side effects of a particular solution 
(such as serialized page-table modifica¬ 
tions, increased page-ins and TLB misses, 
and the inability to use time-saving optimi¬ 
zations). In scalable multiprocessor archi¬ 
tectures, the average time to Satisfy a 
memory request grows with system size. 
Accordingly, the importance of caches, 
TLBs, and efficient solutions to the cache 
and TLB consistency problems also grows 
with system size. 

With these points in mind, I discuss the 
six solutions that do not require virtual- 
address caches kept consistent by hard¬ 
ware (see Table 1). For each solution, I 
describe the algorithm, the hardware, the 
limitations with respect to the memory¬ 
sharing patterns it supports, and the cost in 
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Table 1. Summary of solutions to the translation-lookaside buffer consistency problem. 


Solution 

(Strategy) 

Required Hardware 

Memory Sharing 
Limitations 

Performance Overhead 

TLB 

shootdown 

(invalidate) 

Interprocessor interrupt 
and architected TLB 
entry invalidation 

None 

Processors that might be using a page table that is 
being modified are forcibly interrupted and idled until 
the unsafe change has been made. Participation of the 
modifying processor depends on the number of other 
participating processors. Overhead also includes 
interprocessor communication and synchronization, 
serialization of unsafe changes to a page table, and 
serialization of page-table modification and use. 

Modified 

TLB 

shootdown 

(invalidate) 

High-priority 
interprocessor interrupt, 
architected TLB entry 
invalidation, and 
hardware support for 
atomic operations 

None 

Overhead is the same as for TLB shootdown with two 
exceptions: Interrupted processors are not idled, and 
their participation is small and possibly constant. 
Page-table modification and use are not serialized. 

Lazy 

devaluation 
(delay & 
invalidate) 

Extra TLB field, 
interprocessor interrupt, 
and architected TLB 
entry invalidation 

No parallel execution in 
same address space or 
remote address-space 
modification. 

In some cases, the modifying processor independently 
updates the TLBID or invalidates TLB entries. In 
other cases, overhead is the same as for TLB 
shootdown. 

Read- 

TLBs 
(avoid & 
postpone) 

Interprocessor interrupt 
and architected TLB 
entry invalidation 

No parallel execution in 
same address space or 
remote address-space 
modification. Copy-on- 
write sometimes 
forfeited. 

The counter is updated on each TLB reload, 
invalidation, and replacement. When an unsafe 
change cannot be postponed, overhead is the same as 
for TLB shootdown. 

Memory- 

TLBs 

(invalidate) 

TLBs with bus monitors 
interconnected by bus, 
virtual-address caches, 
and sufficient network 
bandwidth 

Address-space identifier 
or single, global address 
space. Given multiple 
memory clusters, pages 
in different clusters 
cannot map to same page 
and virtual-memory 
management is restricted. 

The TLB notifies the processor of a page fault, 
protection exception, and virtual-memory 
deallocation. Also, memory requests contain virtual 
addresses, and there might be contention for a cluster 
bus and TLBs. 

Validation 
(detect & 
correct) 

Validation tables, extra 
TLB field, comparators, 
and sufficient network 
bandwidth 

None 

Memory requests contain a generation count. The 
modifying processor updates the generation count. An 
extra network trip is needed when a stale TLB fentry is 
used. Overhead also includes a solution to the 
generation-count wraparound problem. 


processor execution time and multiproces¬ 
sor performance. It is difficult to determine 
which solution is most suitable for a scal¬ 
able architecture. Since small-scale multi¬ 
processors and prototypes of large-scale 
multiprocessors have been built, the per¬ 
formance of some solutions can be evalu¬ 
ated in these systems. However, appropri¬ 
ate large-scale multiprocessors are not 


available. Also, factors that could deter¬ 
mine a solution’s efficiency have not yet 
been measured (such as the frequency of 
page-table modifications, the rate at which 
TLB inconsistencies occur, and the degree 
of memory sharing among processes coop¬ 
erating in program execution). These fac¬ 
tors are of particular interest in scalable 
architectures, since a solution’s efficiency 


becomes even more important if these 
measurements increase with system size. 

Since I do not have data to quantitatively 
evaluate these solutions, I compare their 
potential effects on multiprocessor per¬ 
formance and the limitations they impose 
on memory sharing. When implemented in 
a multiprocessor with a shared bus inter¬ 
connecting processors and memory, the 
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overhead is very low for the three solutions 
that require virtual-address caches kept 
consistent by hardware, since only the 
processor modifying a page table partici¬ 
pates in the algorithm, and this processor’s 
related execution time is small and con- 

Although solutions for multiprocessors 
with more general interconnnects do not 
have such small overheads, two solutions 
— memory-based TLBs and validation — 
come close to achieving this goal. Such a 
small overhead might not be necessary, 
however, if TLB inconsistencies are rare 
events and memory sharing among pro¬ 
cesses is limited. In this case, a solution 
called TLB shootdown, which requires 
essentially no hardware support, might be 
adequate, even though interprocessor 
communication and synchronization are 
inherent in its algorithm. 

The number of ways in which TLB in¬ 
consistencies can arise is a function of the 
generality of memory-sharing patterns 
among processes. Of the six solutions, TLB 
shootdown, modified TLB shootdown, and 
validation solve the TLB consistency prob¬ 
lem for all memory-sharing patterns. If the 
targeted multiprocessor has only one clus¬ 
ter of memory modules, then memory- 
based TLBs joins this group. Otherwise, 
memory-based TLBs, as well as lazy de¬ 
valuation and read-locked TLBs, solve the 
problem for only a limited set of memory¬ 
sharing patterns. 

I first define the function of a TLB and 
describe how TLB inconsistency arises 
both in uniprocessor and multiprocessor 
architectures. Then, after explaining how 
the problem of TLB consistency is solved 
in a uniprocessor and in multiprocessors 
with a shared bus, virtual-address caches, 
and hardware cache consistency, I describe 
solutions that can be implemented in multi¬ 
processors with more general interconnec¬ 
tion networks and without hardware cache 
consistency. 

Virtual memory and 
TLBs 

Without a TLB, two or three memory 
accesses might be needed to satisfy a data 
request. To understand why this is so, 
consider the organization and management 
of a paged virtual-memory system and the 
function of a TLB. 

A hierarchical memory system that 
supports paged virtual memory includes 
both main and auxiliary memories. Main 


memory is divided into frames, each con¬ 
taining an equal number of memory loca¬ 
tions. Virtual memory comprises a set of 
virtual pages, the size of each being a 
multiple of the corresponding frame size. 
An image of a virtual page can reside in 
both main and auxiliary memory, but only 
the image in main memory can be read or 
written. Therefore, these two images are 
often inconsistent. 

While executing a program, a processor 
references data by virtual address. The 
high-order bits of a data item’s virtual 
address identify the page where the data is 
stored, and the low-order bits indicate the 
displacement from the beginning of the 
page to the data’s location. Although data 
is referenced by virtual address, data often 
can be accessed only when its virtual ad¬ 
dress is translated to a physical address, 
where the high-order bits identify a frame 
and the low-order bits identify a displace¬ 
ment. Address translation is not necessary 
if a virtual-address, general-purpose cache 
is associated with each processor and the 
referenced data is cacheable and cache 
resident. 

Virtual-to-physical address translation 
information is stored in a page table, which 
has an entry for each page. The informa¬ 
tion includes 

• the location of the page in physical 
memory (a frame number); 

• a protection field indicating how the 
page can be accessed, for example, 
read/write or read-only; 

• a valid bit (sometimes called a resident 
bit) indicating if the frame number is 
valid; and 

• a modify bit (sometimes called a dirty 
bit) indicating if the page was modi¬ 
fied since it became main-memory 
resident, that is, if an exact image of 
the page resides in auxiliary memory. 

TLBs give processors fast access to 
translation information for recently refer¬ 
enced pages, so the processors need not 
access a page table whenever address 
translation is necessary. Since programs 
generally exhibit locality of reference, a 
process usually accesses the same page a 
number of times during a certain time inter¬ 
val. Thus, a TLB often eliminates the need 
for a processor to access translation infor¬ 
mation from main memory. Measurements 
on the VAX-11/780 reveal that translation 
information can be accessed from a TLB 
rather than from a page table more than 97 
times out of 100; that is, the TLB miss rate 
is approximately 3 percent. 1 Without a 


TLB, a data request might require one or 
two additional memory accesses. Two 
accesses might be needed when virtual 
memory is organized in segments, each 
containing a set of pages. In this case, the 
processor might also have to access a data 
structure called a segment table to get the 
location of the page table containing the 
referenced page’s translation information. 

Figure 1 illustrates the data paths of a 
typical physical-address cache and TLB 
design, where the page and frame sizes are 
assumed to be equal. In this case, cache 
access and address translation can overlap 
to some extent. 2 A processor uses the high- 
order bits of the virtual address (the page 
number) to get the referenced page’s trans¬ 
lation information from the TLB. A TLB 
hit occurs if a valid TLB entry exists for the 
page. The number of the frame containing 
the page is then concatenated with the low- 
order bits of the virtual address, yielding 
the referenced data’s physical address, 
which can be used to access either the 
cache or main memory. Otherwise, a TLB 
miss occurs, initiating a TLB reload. Using 
the virtual page number to form the ad¬ 
dress of the page’s page-table entry, a TLB 
reload accesses and loads in the TLB a 
copy of the translation information stored 
in the page table. In general, one entry must 
be removed from the TLB to make room 
for another. 

When the referenced page is not in main 
memory, a page fault occurs. If the page is 
not already becoming resident, such a 
process, called a page-in , begins. A page¬ 
in allocates a frame to the page and, if 
necessary, copies the page’s image in 
auxiliary memory to the allocated frame. If 
all physical memory has been allocated, a 
page-in evicts another page from main 
memory. Before a page is evicted, how¬ 
ever, auxiliary memory must contain an 
exact image of it; this could require writing 
the evicted page to auxiliary memory be¬ 
fore overwriting it with the newly refer¬ 
enced page. 

Multiple processes can access, or share, 
a page by either a one-to-one or a many-to- 
one mapping of pages to frames. General 
memory-sharing patterns are produced 
when 

• processes sharing a page via a many- 
to-one mapping can execute concur¬ 
rently on different processors; 

• multiple processes can execute con¬ 
currently in the same address space 
(the set of virtual addresses a process 
can reference); and 

• a process can modify the translation 
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Figure 1. Data paths of a typical physical-address cache and TLB design. 


information of a page in the address 
space of another process, which might 
be executing concurrently on a differ¬ 
ent processor. 

When pages are mapped to frames on a 
many-to-one basis and process identifiers 
are not affixed to virtual page numbers, 
multiple processes can refer to the same 
page with different virtual page numbers. 
Therefore, switching processor execution 
from one process to another, called a con¬ 
text switch, requires invalidating all TLB 
entries. This invalidation is called a TLB 
flush. If pages are mapped to frames on a 
one-to-one basis or process identifiers are 
affixed to virtual page numbers, a page is 
always referenced with a distinct page 
number. Therefore, a context switch in a 
uniprocessor does not require a TLB flush. 
This is also the case in a multiprocessor if 
TLB consistency is guaranteed when pro¬ 
cesses migrate among processors. 

The problem of TLB 
consistency 

More than one image of a page’s transla¬ 
tion information can exist: one is stored in 
the page table and others can be stored in 
TLBs. Processes can have access to mul¬ 
tiple images, using TLB images for vir- 
tual-to-physical address translation and 
accessing the page-table image to perform 
TLB reloads or modify translation infor¬ 
mation. Therefore, page-table modifica¬ 
tions can make TLB entries inconsistent 
with the page-table entries they are sup¬ 
posed to mirror. Since inconsistent TLB 
entries can generate erroneous memory 
accesses, this TLB consistency problem 
must be solved by ensuring consistency or 
preventing the use of inconsistent entries. 

TLB inconsistencies resulting from 
updates of page-reference history informa¬ 
tion (such as bits that record if a page has 
been modified or referenced during some 
time interval) are harmless if precautions 
are taken when updating page tables. 
However, inconsistencies resulting from 
other page-table modifications, catego¬ 
rized as safe and unsafe changes, can be 
harmful. 

Page-reference history information 
need not be consistent among TLBs and 
page tables. However, it is crucial that the 
reference history information stored in 
page tables reflect states that will not cause 
erroneous program behavior. Information 
can be lost if a modify bit that indicates 
whether a page has been written is inaccu¬ 


rate, since the only accurate representation 
of a page (stored in main memory) can be 
overwritten if it does not also reside in 
auxiliary memory. On the other hand, a bit 
that indicates whether a page has been 
referenced need not be accurate, since it is 
generally used as a heuristic to select a 
page to replace in main memory; replacing 
one page instead of another does not cause 
errors, although it can have performance 
consequences. In any case, if processors 
can concurrently modify a page-table en¬ 
try (for example, if one can change a page ’ s 
mapping while another sets the modify 
bit), then a page’s reference history must 
be updated in such a way that it does not 
corrupt current page-table information. 

A safe change results from 

• a reduction in page protection, such as 
the modification of access rights from 
read-only to read/write, or 

• a page ’ s becoming main-memory resi¬ 
dent. 

We can avoid using a TLB entry that is 
inconsistent due to a safe change by de¬ 


signing the hardware so that using the entry 
generates an exception, which invokes an 
operating-system trap routine that invali¬ 
dates or corrects the TLB entry. For ex¬ 
ample, suppose the translation information 
for page a, stored in a valid TLB entry 
accessed by processor X, defines the page ’ s 
protection as read-only. Also, suppose a’s 
page-table entry is subsequently modified 
by a process executing on processor Y to 
reflect a change in protection from read¬ 
only to read/write. When a process execut¬ 
ing on X attempts to write a, the operating 
system intervenes, since the TLB entry for 
a that X accessed defines the protection as 
read-only. Checking a’s page-table entry, 
the operating system finds that the protec¬ 
tion has been increased and invalidates the 
stale TLB entry. After X reloads a consis¬ 
tent TLB entry for a and resumes execution 
(assuming X hasn’t started executing a 
different process), the modification of a 
will be successfully and correctly exe¬ 
cuted. 

In contrast, unsafe changes cause TLB 
inconsistencies that cannot be detected 
during program execution. This class of 
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page-table modifications raises the need 
for TLB management that ensures consis¬ 
tency. An unsafe change results from 

• the invalidation of a virtual-to-physi- 
cal memory mapping, 

• an increase in page protection, or 

• the remapping of a virtual or physical 
page, that is, a change in the mapping 
of a page to a frame. 

The virtual-to-physical memory map¬ 
ping of a page is invalidated when virtual 
memory is deallocated or when a page is 
evicted from main memory. Virtual mem¬ 
ory is deallocated when virtual pages are 
removed from the address space of a pro¬ 
cess. A shared page’s protection can be 
increased to implement the copy-on-write 
optimization and interprocess communi¬ 
cation. Copy-on-write saves copying over¬ 
head by letting multiple processes share a 
page as a read-only page until one of the 
processes attempts to modify it. 

Unsafe changes can cause TLB incon¬ 
sistency in both uniprocessor and multi¬ 
processor systems, as illustrated in Figure 
2 and described below. Consider a multi¬ 
processor system, and suppose that valid 
entries map page a to frame A in both the 


TLB accessed by processor X and the TLB 
accessed by processor Y. Now suppose that 
a process executing on X maps page b to 
frame A and evicts page a, updating X’s 
TLB accordingly. Unless Y is prevented 
from using a 's inconsistent TLB entry, it 
could erroneously access b when attempt¬ 
ing to access a. Now consider X by itself. 
Whether in a uniprocessor or multiproces¬ 
sor system, X faces the same problem as Y 
if it does not update its own TLB after 
modifying the mapping for page a and 
before issuing any other memory requests. 

Since the maintenance of page-refer¬ 
ence history information is a separate is¬ 
sue, and since TLB entries made inconsis¬ 
tent by safe changes can be detected and 
corrected by the operating system, the term 
“TLB consistency problem” in the rest of 
this article refers only to TLB inconsisten¬ 
cies caused by unsafe changes. 

Solutions to the TLB 
consistency problem 

The TLB consistency problem is easy to 
solve in a uniprocessor, but it is decidedly 
more difficult in a multiprocessor. In a 


multiprocessor architecture with multiple 
TLBs, in addition to a consistency problem 
between a page table and a TLB, a consis¬ 
tency problem exists among the system’s 
multiple TLBs and page tables. Figure 3 
shows a system with N processors and N 
memory modules, where one TLB is asso¬ 
ciated with each processor. The memory 
modules are collected into m clusters of 
sizec, where a<m<N, and cm = N, and a 
frame is interleaved across a cluster’s 
memory modules. Processors and memory 
are interconnected by a shared bus or by a 
more general network, such as a multistage 
network. In bus-based architectures, the 
shared bus can help solve the cache and 
TLB consistency problems. In multi¬ 
processors with more-general intercon¬ 
nection networks and no bus-interconnect¬ 
ing processors, these problems are more 
difficult. 

Although the solutions I describe that 
require virtual-address caches kept consis¬ 
tent by hardware are limited to bus-based 
multiprocessors, the algorithms them¬ 
selves are not so limited. Thus far, hard¬ 
ware cache consistency has been imple¬ 
mented only in bus-based multiprocessors. 
However, distinct solutions to both the 
cache and TLB consistency problems have 
been proposed for multiprocessors with 
more-general interconnection networks. 
For example, directory-based schemes or 
software-assisted cache management can 
solve the cache consistency problem in 
multiprocessors with multistage intercon¬ 
nection networks. Directory-based 
schemes incur an overhead that does not 
scale with the dimensions of the network 3 , 
so this approach to cache consistency 
might only be suitable for a limited class of 
applications, although it could provide an 
efficient solution to the TLB consistency 
problem. Software-assisted cache man¬ 
agement 4 - 5 depends on information avail¬ 
able to the compiler. This approach could 
prove effective for ensuring cache consis¬ 
tency, but it is not suitable for solving the 
TLB consistency problem because infor¬ 
mation required to manage virtual memory 
is available to the operating system and not 
to the compiler. 

Solutions for uniprocessors. In a uni¬ 
processor, the operating system ensures 
TLB consistency by inhibiting context 
switching while the processor modifies a 
page-table entry and updates its TLB ac¬ 
cordingly. No memory accesses occur dur¬ 
ing this time except to access the page table, 
so no inconsistent TLB entries are used to 
access memory. In a shared-memory 
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multiprocessor, this approach can only 
ensure the consistency of the TLB accessed 
by a processor modifying a page table, so 
no more than one TLB image of the page’s 
translation information is guaranteed to be 
consistent with the page table. 

Solutions requiring virtual-address 
caches and hardware cache consistency. 

In bus-based multiprocessors, we can pre¬ 
vent the use of inconsistent TLB entries by 
not allowing memory accesses after a page 
table’s main-memory image is modified 
and before all TLBs are consistent with the 
modified page table. We accomplish this 
by associating a bus monitor with each 
cache to recognize all memory accesses. 
The TLB consistency solutions proposed 
by Cheriton, Slavenburg, and Boyle, 6 
Goodman, 7 and Wood et al. 8 use this 
method. 

These solutions assume each processor 
has a virtual-address, general-purpose 
cache that stores translation information 
with other data and is kept consistent via a 
chosen protocol. 9 Solving the cache con¬ 
sistency problem also solves the TLB 
consistency problem, since would-be TLB 
entries are stored in, and accessed from, 
the cache. Memory requests transmitted on 
the bus trigger the bus monitors to ensure 
cache consistency. When a page-table 
entry is modified, each monitor checks to 
see if translation information for the asso¬ 
ciated page is cache resident; if so, the 
monitor invalidates or updates the corre¬ 
sponding cache entry. 

Two relevant issues give a general idea 
of how each of these solutions works (I 
have omitted some important implementa¬ 
tion details). First, I show how the bus 
monitor can query the cache when the 
cache is addressed by a virtual address and 
the memory request is targeted to a physi¬ 
cal address. Second, since a many-to-one 
mapping of pages to frames can yield more 
than one cache entry for the same data, I 
explain how the solutions handle this prob¬ 
lem, called the address synonym problem. 2 

The solution of Wood et al., 8 imple¬ 
mented as part of the Symbolic Processing 
Using RISC (SPUR) workstation project at 
the Univeristy of California at Berkeley, 
assumes a dual-address bus, where a 
memory request includes both the virtual 
and physical addresses of the referenced 
data. The bus monitor can thus query the 
cache with the transmitted virtual address. 
The synonym problem is avoided by as¬ 
suming a one-to-one mapping from pages 
to frames, which defines a single, global 
address space. 


This idea can be taken one step further to 
support a many-to-one mapping of pages 
to frames, increasing the ways memory can 
be shared. In Goodman’s solution, 7 the 
cache has two copies of the cache direc¬ 
tory, one accessed by the processor with 
virtual addresses and the other by the bus 
monitor with physical addresses. Thus, 
physical rather than virtual- addresses can 
trigger the bus monitor. The system binds 
and unbinds virtual addresses to physical 
addresses in response to bus transactions, 
solving the synonym problem. 

Stanford University has incorporated 
the solution proposed by Cheriton, Slav¬ 
enburg, and Boyle 6 in the design of the 
VMP multiprocessor, an experimental 
shared-memory, bus-based multiproces¬ 
sor. If the cache cannot satisfy a data re¬ 
quest, the frame containing the data is 
cached. An action table maintained by each 
bus monitor contains information for each 
cached page frame. Bus transactions trig¬ 
ger the bus monitors to ensure cache con¬ 
sistency. Since the bus monitors’ actions 
are keyed to frame numbers, the synonym 
problem is solved. 

When implemented in a bus-based 
multiprocessor, the overhead associated 
with these solutions is very low because 
only the modifying processor participates 
in the algorithm and its participation per 
page-table modification is small and con¬ 
stant. The bus monitors maintain consis¬ 


tency independently of the processors. As 
mentioned earlier, only the implementa¬ 
tions are limited to bus-based multiproces¬ 
sors; the algorithms are not. In any case, 
solutions to the TLB consistency problem 
are also needed for multiprocessor archi¬ 
tectures without virtual-address caches 
kept consistent by hardware. 

Solutions without hardware cache 
consistency. Six TLB consistency solu¬ 
tions can be used in multiprocessors with¬ 
out hardware cache consistency. Two so¬ 
lutions require essentially no hardware: 
TLB shootdown and read-locked TLBs. 
Three solutions — TLB shootdown, modi¬ 
fied TLB shootdown, and validation (and, 
if the targeted multiprocessor has only one 
cluster of memory modules, memory- 
based TLBs) — do not limit memory shar¬ 
ing among processes. But none of these 
solutions have the small overhead exhib¬ 
ited by the solutions described above. If 
the targeted architecture has sufficient 
bandwidth, however, memory-based 
TLBs and validation (which are meant for 
scalable multiprocessor architectures with 
multistage interconnection networks) 
come close to achieving this goal, since 
each meets the following criteria: 

• The participation of a processor modi¬ 
fying a page table is small and constant. 

• The participation of another processor 
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is necessary only when a page affected by 
an unsafe change is accessed from memory 
(memory-based TLBs) or when an incon¬ 
sistent TLB entry is used (validation). 

• Locks are placed on the smallest pos¬ 
sible entities. 

• Serialization is not introduced. 

• Explicit interprocessor communica¬ 
tion and synchronization are not required. 

TLB shootdown. Black et al. 10 have pro¬ 
posed an essentially hardware-independ¬ 
ent solution called TLB shootdown. This 
algorithm is included in Carnegie Mellon 
University’s Mach operating system, 
which has been ported to multiprocessor 
architectures including the BBN Butterfly, 
Encore’s Multimax, IBM’s RP3, and 
Sequent’s Balance and Symmetry systems. 
In this algorithm, a lock associated with 
each page table must be secured to modify 
either a page table or the list of processors 
that may be using the table. The following 
sequence describes generally how the al¬ 
gorithm works. 

• A processor that wants to modify a 
page table disables interprocessor inter¬ 
rupts and clears its active flag (such a flag 
is associated with each processor and indi¬ 
cates if the processor is actively using any 
page table). The processor then locks the 
page table, flushes TLB entries related to 
pages for which translation information is 
to be modified, enqueues a message for 
each interrupted processor describing the 
TLB actions to be performed, and sends an 
interprocessor interrupt to other proces¬ 
sors that might be using the page table. 

• When a processor receives the inter¬ 
rupt, it clears its active flag. 

• The modifying processor busy-waits 
until the active flags of all interrupted 
processors are clear and then modifies the 
page table. Finally, after releasing the 
page-table lock and setting its active flag, 
the modifying processor resumes execu- 

• After clearing its active flag, each 
interrupted processor busy-waits until 
none of the page tables it is using are 
locked. Then, after executing the required 
TLB actions and setting its active flag, it 
resumes execution. 

This algorithm idles all processors that 
may be using a page table while it is being 
modified. In addition, the modifying pro¬ 
cessor cannot make any modifications 
untilall interrupted processors have 
cleared their active flags. This synchroni¬ 
zation can be very costly for applications 


where many processes share data. Black et 
al. state that their algorithm could pose 
problems for large-scale systems because 
it scales linearly with the number of pro¬ 
cessors. Extrapolating measurements from 
an instrumented Mach kernel running on a 
16-processor Encore Multimax indicates 
that machines with a few hundred proces¬ 
sors might not experience a performance 
problem, except with regard to kernel 
space. To surmount this problem. Black et 
al. suggest restructuring the operating 
system ’ s use of memory so that TLB shoot- 
downs are limited to groups of processors 
rather than all processors. Note, however, 
that parallel programs with the same degree 
of sharing as exhibited by the operating 
system will encounter the same perform¬ 
ance problems as the operating system. 

Modified TLB shootdown. As 
Rosenburg explains, 11 the level of syn¬ 
chronization exhibited by the original TLB 
shootdown algorithm is necessitated by 
architectural features of multiprocessors 
to which Mach is targeted. Rosenburg’s 
version of this algorithm uses features of 
IBM’s RP3 and requires less synchroniza¬ 
tion. Experiments on the RP3 show that the 
time spent by the interrupted processors 
can remain constant even as the number of 
processors grows. In particular, processors 
using a page table need not busy-wait while 
the table is being modified, and a processor 
can modify the table without waiting for 
other processors to synchronize. In addi¬ 
tion, even though page-table modifications 
are serialized, processors can use a shared 
page table during a modification, since 
modifications are performed atomically. 
The semantics of this algorithm are slightly 
different from the original TLB shootdown 
algorithm, since at any given moment one 
processor can use a page’s old translation 
while another processor uses a new oije. 
Rosenburg’s algorithm has been imple¬ 
mented in the version of Mach running on 
the RP3. In both versions: 

• The participation of a modifying proc¬ 
essor grows with the number of processors 
involved in the algorithm. 

• The execution of all processors that 
might have a TLB entry that will become 
inconsistent (or that is inconsistent, in the 
case of modified TLB shootdown) as a 
result of an unsafe change are forcibly 
interrupted, whether or not they have used 
or will use the translation information in 
question. 

• Parallelism is limited, since only one 
process can modify a given page table at 


one time and processors must synchronize 
and communicate with one another. 

• The scheduling of a process is delayed 
if a page table it may use is being modified. 

TLB shootdown ensures TLB consis¬ 
tency no matter how memory is shared, 
since processors cannot use translation 
information associated with a page table 
while the table is being modified. As 
mentioned above, Rosenburg’s version has 
slightly different semantics and requires 
extra hardware support, but it is as effec¬ 
tive and performs better. 

TLB shootdown is a good solution to the 
TLB consistency problem for multiproces¬ 
sors with no hardware support for this 
purpose. In addition, if unsafe changes 
rarely occur and memory is not widely 
shared among processes, then either ver¬ 
sion of the algorithm should perform well. 
Otherwise, the overhead of these solutions 
is likely to adversely effect multiprocessor 
performance. 

Lazy devaluation. Why not postpone 
invalidating TLB entries until absolutely 
necessary? Thompson et al. 12 proposed 
this strategy, called lazy devaluation, to 
ensure TLB consistency when virtual 
memory is deallocated and when page 
protection is increased. Lazy devaluation 
has been implemented in a multiprocessor 
architecture based on the MIPS R2000 
processor running the System 5.3 Unix 
operating system. Although this solution is 
effective for the targeted multiprocessor 
system, as observed by Black et al., 10 nei¬ 
ther this Unix implementation nor this TLB 
consistency solution supports multiple 
processes executing in the same address 
space or allows a process to modify the 
address space of another executing pro- 

The MIPS R2000 gives a TLB entry a 
six-bit field for an address-space identi¬ 
fier, so TLBs need not be flushed on con¬ 
text switches. To implement lazy devalu¬ 
ation, TLB identifiers (TLBIDs) associated 
with active processes identify when TLB 
entries on a particular processor must be 
replaced or invalidated. A TLB reload also 
loads the TLB ID of the executing process 
into the TLB entry’s address-space identi¬ 
fier field. A TLB hit occurs when the refer¬ 
enced virtual address and the TLBID of the 
referencing process match the TLB entry’s 
virtual address and TLBID. 

When a process releases a region of 
virtual memory, a new TLBID is assigned 
to that process, making any TLB entries 
associated with the released space inacces- 
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sible to the process. When TLBIDs are 
recycled, all TLBs must be flushed. The 
eviction of a page is postponed until all 
related TLB entries are invalidated via a 
systemwide TLB flush. The remapping of 
a page is postponed until no TLB contains 
an entry for the page. If the remapping 
cannot be stalled, then all TLBs are flushed 
so that the change can occur. To handle 
increases in page protection, a data struc¬ 
ture associated with each process or each 
TLBID tracks the processors that might 
have inconsistent TLB entries. When a 
process migrates to such a processor, those 
entries are flushed from its TLB. 

This approach has problems when pro¬ 
cesses execute in parallel within the same 
address space or when a process modifies 
the address space of another process that 
might be executing concurrently on an¬ 
other processor. When virtual memory is 
deallocated, modifying a process’ TLBID 
makes associated TLB entries inaccessible 
to that process but not to other processes 
with the same address space. The proce¬ 
dure suggested for maintaining TLB con¬ 
sistency in the face of increases in protec¬ 
tion is only effective for a modifying 
processor’s TLB. It does not work if proc¬ 
esses executing concurrently on other 
processors share the same address space, 
although it can be used to implement the 
copy-on-write optimization. 

If processes do not migrate frequently 
and unsafe changes rarely occur, then lazy 
devaluation could be efficient for systems 
similar to its target system. In fact, prelimi¬ 
nary performance figures indicate this 
approach succeeds for the target system 
without excessive TLB flushing. How¬ 
ever, lazy devaluation could adversely 
affect multiprocessor performance if un¬ 
safe changes are not rare events. To im¬ 
prove performance, Thompson et al. sug¬ 
gest maintaining extra information to al¬ 
low selective invalidation of TLB entries 
rather than systemwide TLB flushes. In 
either case, interprocessor communication 
and synchronization are required. In addi¬ 
tion, lazy devaluation does not support 
general memory-sharing patterns. 

Read-locked TLBs. Preventing a TLB 
entry from becoming invalid prevents TLB 
inconsistency. This is the premise of read- 
locked TLBs , proposed by myself, Kenner, 
and Snir. 13 This strategy prohibits an un¬ 
safe change while a valid copy of the trans¬ 
lation information to be modified resides 
in one or more TLBs. In essence, a proces¬ 
sor maintains a read lock on a page-table 
entry as long as the page’s translation in¬ 


formation resides in its TLB. Thus, TLB 
entries never become inconsistent, and 
their modification and use is serialized. 
However, some unsafe changes might have 
to be postponed, and memory sharing 
among processes is limited. 

To implement read-locked TLBs, a 
counter associated with each resident page 
is incremented on a TLB reload and decre¬ 
mented on a TLB invalidation or replace¬ 
ment of the page’s translation information. 
This increase in the cost of TLB manage¬ 
ment constitutes this solution’s explicit 
overhead. If the page-replacement policy 
inherently adopted by read-locked TLBs 
proves effective, this overhead can be 
partially attributed to the cost of virtual 
memory management. On the other hand, 
this solution could incur additional over¬ 
head if its restrictions increase the number 
of page-ins and page faults or if the inabil¬ 
ity to use the copy-on-write optimization 
causes unnecessary page copying. 

Our strategy dictates that a page is not a 
candidate for eviction if any TLB contains 
a valid entry for it. Sequent’s Dynix oper¬ 
ating system uses a similar strategy, where 
a page is not a candidate for eviction if it is 


mapped to the address space of an active 
process. Virtual memory can be deallo¬ 
cated at any time, but a mapping for a 
deallocated page must not be used after the 
virtual page has been reallocated. We can 
assure this by postponing the reuse of a 
virtual page until no TLB contains an entry 
for it, except the TLB of the processor 
reusing the page. When these unsafe 
changes cannot be postponed, we must use 
an alternate strategy, such a& TLB 
shootdown. 

Like lazy devaluation, the read-locked 
TLB strategy is not effective if processes 
execute in the same address space or if 
address spaces are modified remotely. 
Unlike lazy devaluation, this strategy does 
not require flushing TLBs when a page is 
evicted, but it might be forced to forego the 
copy-on-write optimization. Either solu¬ 
tion is suitable for a multiprocessor with 
restricted memory sharing if unsafe 
changes involving many processors and 
incurring an overhead similar to that of 
TLB shootdown do not occur frequently. 
In addition, techniques in each solution 
can effectively reduce performance over¬ 
head in other TLB consistency solutions. 
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Figure 5. A multiprocessor with hardware support for validation. 


Memory-based TLBs. If TLBs are asso¬ 
ciated with memory modules rather than 
processors, a TLB reload does not require 
network traffic. My colleagues and I sug¬ 
gested such a configuration 13 (see Figure 
4). Since translation is done at the memory, 
general-purpose caches associated with 
each processor must be virtual-address 
caches. A memory request contains the 
virtual rather than physical address of the 
referenced data, so the virtual address 
defines the targeted memory module. If 
physical memory is organized into more 
than one cluster of memory modules, the 
number of frames to which a page can be 
mapped is limited; when a page resides in 
main memory, it is assumed to be stored in 
a specific cluster of memory modules. This 
implies that the page-replacement policy 
applies to each cluster independently. 

The TLB is implemented as a cache with 
a bus monitor, as proposed by Goodman. 7 
Within each memory cluster, the TLBs and 
the cluster page table are interconnected 
by a bus. A bus protocol 9 ensures consis¬ 
tency among a cluster’s TLBs and page 
table. Memory access can overlap address 
translation, since this bus-based subsys¬ 
tem is independent of the path to main 
memory. 


Remapping of virtual or physical mem¬ 
ory is transparent to a processor, but logic 
at the memory must alert the processor to 
other unsafe changes. That is, a processor 
must be informed if a referenced page is 
not resident, if its protection has been in¬ 
creased, or if the page has been deallo¬ 
cated. In response, the processor executes 
a corresponding trap routine. 

To solve the synonym problem (since 
multiple, possibly independent, processes 
can access the same TLB), the address pre¬ 
sented to a TLB must include an address- 
space identifier, or a single, global address 
space must be assumed. In addition, the 
existence of more than one cluster of mem¬ 
ory modules compromises the flexibility 
with which memory can be shared, since 
virtual pages that map to the same physical 
page must map to the same cluster. 

If unsafe changes are not rare events or 
pages are widely shared among processes, 
a small performance overhead might be 
important. This is true for memory-based 
TLBs. No processor participates in the al¬ 
gorithm, and network traffic is generated 
only to inform processors of unsafe 
changes. In addition, this solution affords 
a large degree of parallelism in maintain¬ 
ing TLB consistency. 


However, other issues can affect per¬ 
formance: 

• Since the TLBs are interconnected by 
a bus, the number of memory modules per 
cluster is limited. 

• As the number of TLBs in a cluster 
grows, bus traffic increases if the intra- 
cluster rate at which unsafe changes or 
TLB misses occur also grows. This in¬ 
crease in traffic could raise the cost of a 
TLB reload and lengthen the time to satisfy 
a memory request. 

• If the path between the processors and 
memory is limited so that the size of the 
virtual address increases the average mes¬ 
sage length, the time to satisfy a memory 
request could increase. 

• Since more than one processor can 
access any one TLB, TLBs might have to 
be larger than those accessed by only one 
processor. 

If these side effects do not degrade per¬ 
formance, then memory-based TLBs 
should perform well in a multiprocessor 
architecture with the necessary hardware 
support, even if unsafe changes are fre¬ 
quent and memory is widely shared among 
processes. However, if there is more than 
one cluster of memory modules, the func¬ 
tion of shared memory is somewhat lim¬ 
ited and virtual-memory management is 
restricted. 

Validation. Rather than invalidating 
inconsistent TLB entries to prevent their 
use, my colleagues and I suggested another 
approach, called validation. In this strat¬ 
egy, translation information is validated 
before it is used to access data stored in 
memory. This is accomplished by associ¬ 
ating a generation count with each frame 
and modifying the count whenever an 
unsafe change affects the page mapped to 
that frame. A memory request is aug¬ 
mented with the generation count stored in 
the issuing processor’s TLB entry for the 
referenced page. A frame’s most recent 
generation count is stored in the validation 
tables associated with a memory cluster 
(see Figure 5). In addition, possibly out¬ 
dated generation counts are stored in TLBs 
and in the page-frame table, which has an 
entry for each page residing in physical 
memory. Whether the translation informa¬ 
tion used by the processor is stale is deter¬ 
mined by comparing the generation count 
transmitted with the memory request with 
the generation count stored in the targeted 
memory module’s validation table. If the 
information is not stale, the memory re- 
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quest is satisfied; otherwise, the processor 
is notified to invalidate the referenced 
page’s TLB entry. The validation of the 
transmitted generation count and memory 
access is assumed to be atomic. 

A cluster’s validation table is replicated 
at each memory module in the cluster. 
(Assume that a frame is interleaved across 
the memory modules in a cluster.) Before 
an unsafe change is made, the modifying 
processor multicasts a message to the 
memory modules in the cluster where the 
page resides. The receipt of this message 
initiates the updating of the generation 
count associated with the frame where the 
page is stored. It also maintains consis¬ 
tency among a cluster’s validation tables. 
The change commences only after the 
tables are updated. After the unsafe change 
is complete, the generation count stored in 
the page-frame table is also updated. A 
TLB reload, in addition to loading transla¬ 
tion information stored in a page table, 
loads a generation count stored in the page- 
frame table. In this way, no reference to the 
page can be satisfied until the referencing 
processor has a current image of the trans¬ 
lation information stored in its TLB. This 
makes validation effective no matter how 
memory is shared. 

To implement validation, however, a 
frame’s generation count must be updated 
carefully. If a generation count is incre¬ 
mented to change its value, a problem 
arises when the generation count wraps 
around: There could be an inconsistent 
TLB entry associated with the newly up¬ 
dated generation count value, and this 
inconsistent TLB entry will appear to be 
consistent. We can solve this problem in 
several ways: 

• Use long generation counts, and use a 
TLB shootdown-like algorithm when 
wraparound occurs to invalidate relevant 
TLB entries. 

• Bound the time interval during which a 
TLB entry is guaranteed to be safe with 
respect to generation-count wraparound, 
and invalidate the entry before the interval 
ends. 

• Maintain a set of counters for each 
frame, one for each allowable generation- 
count value. Each counter has N bits, where 
N is the number of processors in the sys¬ 
tem. A TLB reload increments a counter, 
and invalidating or replacing a TLB entry 
decrements a counter. The generation 
count stored in the TLB entry selects the 
counter of the related frame to be incre¬ 
mented or decremented. A frame’s genera¬ 
tion count is updated by assigning it a 


value associated with a zero counter. If 
there are N counters, there is always a zero 
counter. If small generation counts are 
used, however, and if each allowable value 
of a frame’s generation count is present in 
at least one TLB (which is unlikely, since 
stale TLB entries are likely to be replaced 
during program execution), then a TLB 
shootdown-like algorithm must be used to 
create a zero counter. 

The overhead is low for several reasons. 

• A modifying processor acts independ¬ 
ently of other processors, and its participa¬ 
tion translates into a small and constant 
overhead that involves only memory ac¬ 
cesses. 

• A processor’s participation is enlisted 
only when it uses an inconsistent TLB 
entry, and this participation has a small 
overhead (one extra round-trip through the 
network). 

• It is likely that stale entries will be 
replaced before they are used. 

• Participating processors act independ¬ 
ently of one another. 

• Each processor could simultaneously 
participate in ensuring the consistency of a 
TLB entry associated with a distinct page. 

Thus, even if unsafe changes are not rare 
events and memory is widely shared 
among processes, validation should per¬ 
form well in multiprocessor architectures 
with the necessary hardware support. Vali¬ 
dation has two drawbacks, however. First, 
if the network bandwidth is not sufficient 
to transmit a generation count with each 
memory request without increasing the 
number of packets that comprise a mes¬ 
sage, the average time to satisfy a memory 
request could increase. Second, a solution 
to the generation-count wraparound prob¬ 
lem must be implemented. 


A solution to the TLB consistency 
problem for a particular multi¬ 
processor system must support 
the defined functionality of shared mem¬ 
ory and should not degrade the speedups 
attainable by application programs. Since 
we do not know what behavior will be 
exhibited by programs executing in differ¬ 
ent multiprocessor architectures, we can 
only estimate which solution is best suited 
for a scalable multiprocessor with a multi¬ 
stage network. In particular, the following 
issues confront us: 


• None of the proposed solutions has 
been implemented in a large-scale multi¬ 
processor system. 

• We do not know how either the rate of 
unsafe changes or the number of TLBs 
affected by such modifications will change 
as the number of processors, N, increases. 

• It is not clear how much parallelism is 
needed in consistency-ensuring TLB man¬ 
agement. In a bus-based multiprocessor 
employing one of the solutions that re¬ 
quires virtual-address caches and hard¬ 
ware cache consistency, TLB management 
is serial, that is, only inconsistencies 
caused by one page-table modification are 
corrected at any one time. In an architec¬ 
ture with a multistage network, inconsis¬ 
tencies caused by a number of unsafe 
changes can be corrected in parallel, the 
number depending on the solution. 

• There are questions about how to 
measure a problem’s execution speedup 
and, thus, how to evaluate the performance 
of solutions to the TLB consistency prob¬ 
lem. Do we look at the behavior of a cer¬ 
tain-sized problem executing on one pro¬ 
cessor versus its behavior executing on N 
processors? Or do we look at its behavior 
versus the behavior of that same problem 
with a size that grows with N, executing on 
N processors? 

It seems reasonable to increase the prob¬ 
lem size with N. In this case, I feel the rate 
of unsafe changes and the number of pro¬ 
cesses sharing memory will increase as N 
increases. Therefore, algorithms whose 
performance depends on N will not scale 
well. If this is the case, then TLB 
shootdown, modified TLB shootdown, 
lazy devaluation, and read-locked TLBs 
are not good solutions for scalable multi¬ 
processor architectures. In contrast, the 
memory-based TLBs strategy exhibits the 
following characteristics, which yield 
good performance in scalable architec- 


• A processor is not interrupted to par¬ 
ticipate in the solution. 

• Little extra communication is required 
to maintain TLB consistency. 

• No processor synchronization is re¬ 
quired. 

• Parallelism is inherent in the algorithm 
it implements. 

But, if the memory modules are organ¬ 
ized in more than one cluster, then the 
memory-based TLBs strategy does not 
support all memory-sharing patterns and 
restricts virtual-memory management. 
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Therefore, validation seems the most 
promising solution for scalable multi¬ 
processor architectures, since it has all the 
characteristics of memory-based TLBs, 
although extra communication costs can 
be inherent in its implementation. Specifi¬ 
cally: 

• If the transmission of a generation 
count with each memory request increases 
the average network latency, then the time 
required to satisfy a memory request in¬ 
creases. 

• When a processor uses an inconsistent 
TLB entry, a round-trip through the net¬ 
work is required to discover that the entry 
is stale. 

• Communication might be required to 
solve the generation-count wraparound 
problem. 

We can not decisively conclude which 
solutions are best suited to scalable multi¬ 
processor architectures until we know how 
programs behave in these environments. 
The solutions described demonstrate ways 
of solving the TLB consistency problem. 
They may either prove to be efficient solu¬ 
tions in multiprocessor systems or provide 
a basis on which to design other solutions 
to this problem. ■ 
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I n recent years, multiprocessor archi¬ 
tecture has assumed an important 
role in high-speed computing as a 
way of increasing performance over that of 
uniprocessor systems. However, as the 
number of processors increases, the mem¬ 
ory access time — the time for data and 
instructions to travel between the shared 
memory and the processors — increases 
due to memory conflicts and the limited 
throughput of the interconnection media. 

A cache memory can reduce the average 
memory access time. However, before we 
can use private caches in large-scale multi¬ 
processor systems, we must solve the 
cache coherence problem. The coherence 
problem arises if several caches can con¬ 
tain a copy of the same memory location 
and a read from one of them is not guaran¬ 
teed to produce the latest value of the 
location. In this article, we discuss why we 
need to find alternatives to hardware-based 
cache coherence strategies for large-scale 
multiprocessor systems. Then, we present 
three different software-based strategies 
that share the same goals and general ap¬ 
proach. 

Why study software-managed 
caches? Several hardware schemes have 
been proposed for cache coherence en¬ 
forcement in multiprocessor systems. 
Most of them only apply to bus-based sys- 


Large-scale 
multiprocessor 
systems need 
alternatives to 
hardware-based cache 
coherence strategies. 
This article presents 
three software-based 
strategies that have 
common objectives. 


terns. 1 Others use a directory scheme, ei¬ 
ther centralized or distributed, 2 to maintain 
coherence. Bus architectures are not scal¬ 
able to a large number of processors. Nei¬ 
ther can the bus-based approach be used in 
systems using multistage interconnection 
networks. The central directory is a serious 
performance bottleneck and also is not 


scalable to a large number of processors. 
The distributed directory schemes that 
require a presence vector with a bit-per- 
processor per memory line are not scal¬ 
able. 

The only viable scheme is a distributed 
directory scheme that does not require the 
presence vector. 2 Without knowing the 
identity of the caches that contain a line, 
this scheme has to use a broadcast to all 
caches when the line status changes or an 
up-to-date line is requested. Broadcasting 
reduces bandwidth of the interconnection 
network. This directory scheme requires 
complicated protocols that can cause la¬ 
tency to increase considerably. 

On the surface, the cache coherence 
problem can be solved easily if shared data 
are not cached. However, as pointed out in 
the work by F. Darema-Rogers et al., 3 
shared-data accesses account for a large 
portion of global memory accesses. If 
shared data are not cached, the perfor¬ 
mance will suffer. 

We are interested only in strategies suit¬ 
able for shared-memory multiprocessor 
systems with interconnection networks 
and a large number of processors. Of all the 
schemes mentioned above, only one 
scheme might suit such a system. There¬ 
fore, we need an alternative. A scalable, 
efficient cache coherence scheme for 
large-scale systems must: 
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Figure 1. An example of a task graph (a) and its levels (b). 


(1) eliminate runtime communication 
for coherence maintenance, 

(2) require the cost of hardware for 
coherence maintenance to be a very 
slow-growing function of the num¬ 
ber of processors only, not the 
memory size, and 

(3) reduce the total time needed to in¬ 
validate cache lines. 

We believe compiler assistance can help 
us accomplish the above goals. The first 
requirement can be achieved by using 
compile-time analysis to obtain informa¬ 
tion on accesses to a given line by multiple 
processors. Such information can allow 
each processor to manage its own cache 
without interprocessor runtime communi¬ 
cation. This will also allow us to achieve 
the second goal. Compiler-directed man¬ 
agement of caches implies that a processor 
has to issue explicit instructions to invali¬ 
date cache lines. However, if done a line at 
a time, the total time for maintaining co¬ 
herence becomes excessive. To achieve 
the third goal requires using schemes with 
special hardware support to invalidate 
stale data efficiently so the time cost is 
independent of the number of invalidated 
lines. 

These attributes are the essence of the 
three software-based schemes discussed in 
this article. The observation central to all 
of these schemes is that the contents of 
private caches and the shared memory can 
differ as long as incorrect data are not used 


by a processor. This observation relaxes 
the requirement used in hardware-based 
schemes that every write must be made 
known to all caches that contain a copy of 
a line. Therefore, it eliminates the need for 
communication. 

A parallel task-execution model and 
task graph. We will concentrate on main¬ 
taining cache coherence in the execution of 
parallel Fortran programs. We assume that 
the execution of a parallel program (paral¬ 
lelized from a sequential program or writ¬ 
ten in a parallel language) is represented by 
tasks, each executed by a single processor. 
Task migration is not allowed. Tasks inde¬ 
pendent of each other can be scheduled for 
parallel execution. Dependent tasks will 
be executed in the order defined by pro¬ 
gram semantics. The execution order of 
dependent tasks is enforced through syn¬ 
chronization. 

The dependence relationship among 
tasks, and hence the execution order, can be 
described by a task graph, G = {£, T), a di¬ 
rected graph where E is a set of edges and T 
is a set of nodes. A node, T. e T, represents 
a task, and a directed edge, e.. e E, repre¬ 
sents that some statements in T. depend on 
other statements in T. (see Figure la). T. in 
such a case is called the parent node of T , 
and T is called the child node of T.. 

Task nodes are combined into a single 
node using the following criterion: Two 
nodes T. and T connected by an edge e .can 
be combined into one node if T. is the only 


parent node of T., and T. is the only child 
node of T.. ' 

The task graph can be divided into levels 
L = {L g , ... LJ, where each L. is a set of 
tasks such that the longest directed path 
from T 0 , the starting node, to each of the 
tasks in the set has i edges (see Figure lb). 
Tasks on the same level are not connected 
by any directed edges. Therefore, tasks on 
the same level perform no write accesses 
or read-writes to the same data by different 
processors. Such tasks can be executed in 
parallel without intertask synchronization. 

Let us assume that parallelism in a pro¬ 
gram is expressed in terms of parallel 
loops. A parallel loop specifies starting 
execution of iterations of the loop by mul¬ 
tiple processors. In a Doall type of parallel 
loop, all such iterations are independent 
and can be executed in any order. In a 
Doacross type loop, there is a dependence 
between iterations. In terms of tasks, one 
or more iterations of a Doall loop are 
bundled into a task. In a Doacross loop, one 
iteration is a task, and synchronization 
exists between tasks. 

Assumptions. We assume a weakly 
ordered system. 4 While it does not guaran¬ 
tee sequential consistency, the program 
model is quite simple and has performance 
much higher than for strongly ordered 
systems. In terms of our task-execution 
model, this implies that the values written 
in a task level must be deposited in the 
shared memory before the task-level 
boundary can be crossed. 

For clarity, the following discussion 
focuses on parallel task execution without 
intertask dependence. However, the cache 
coherence schemes we’ll discuss in this 
article can be applied to parallel execution 
with intertask synchronization. 

We don’t address the questions of line 
sizes and write policy in this article. We 
will assume a line size of one word and the 
write-through policy. However, the 
schemes presented can be adapted to dif¬ 
ferent line size and write policies. 

The algorithms we present have been 
implemented in the Parafrase restructurer 5 
and the resulting code used for simula¬ 
tions. Because Parafrase does not perform 
interprocedural analysis, we only compile- 
simulate one subroutine at a time, even 
though our analysis can be done interpro- 
cedurally. Because of this, a subroutine is 
assumed to start with an empty cache. 

The memory references of a program 
consist of instruction fetches, private-data 
accesses, and shared-data accesses. Only 
the latter require coherence enforcement 
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for writable data. Private data may only 
become a problem if task migration is 
considered. We assume that instruction, 
private data, and shared read-only data 
accesses can be recognized at runtime and 
will not be affected by the cache coherence 
mechanism. 

A simple invalidation 
approach 

Low invalidation overhead and simplic¬ 
ity characterize the first scheme for main¬ 
taining cache coherence. 6 The scheme 
assumes the value in the shared memory is 
always current and defines an incoherence 
as a condition in which 

(1) a processor performs a memory 
fetch of a value X, and 

(2) a cache hit occurs, but the cache has 
a value different from that in memory. 

An incoherence cannot occur if the access 
is a store. Note that we require a processor 
to try to fetch X; otherwise, the fact that the 
memory and the cache have different val¬ 
ues is not an error. The necessary condi¬ 
tions for the cache incoherence to occur on 
a fetch of X require that 

(1) a value of X is present in the cache of 
processor P, and 

(2) a new value has been stored in the 
shared memory by another processor after 
the access by P that brought X into the 
cache. 

The above conditions can be formulated 
in terms of data dependencies, and a com¬ 
piler can then check for a dependence 
structure that might result in coherence 
violations. This is rather complex, how¬ 
ever, because the test will have to be per¬ 
formed for every read reference. In addi¬ 
tion, the data dependence information does 
not specify whether the references in¬ 
volved are executed by different proces¬ 
sors. To simplify the analysis and to get the 
processor information, we propose to use 
the type of parallel loop. The compiler has 
already performed data dependence analy¬ 
sis to determine the loop type, and proces¬ 
sor assignment is part of the loop execution 
model. Let us consider programs with 
Doall and Doacross loops. By definition, 
any dependence between two statements 
inside a Doall loop is not across iterations, 
but cross-iteration dependencies are pres¬ 
ent in Doacross. It follows that a statement 
S. in a Doall dependent on a statement S. in 


the same loop is executed on the same 
processor as S.. In a Doacross loop, two 
statements with a cross-iteration depen¬ 
dence are executed on different proces¬ 
sors, whereas statements with a depend¬ 
ence on the same iteration are executed on 
the same processor. 

A cache management algorithm. Let 

us assume that the following instructions 
are available for cache management: 

Invalidate. This instruction invalidates 
the entire contents of a cache. Using reset¬ 
table static random-access memories for 
valid bits, this can be accomplished in one 
or two clocks with low hardware cost. 

Cache-On. This instruction causes all 
global memory references to be routed 
through the cache. 

Cache-Off. This instruction causes all 
global memory references to bypass the 
cache and go directly to memory. 

The cache state, on or off, must be part of 
the processor state and must be saved/ 
restored on a context switch. Processes are 
created in the cache-off state. 

The algorithm uses loop types for its 
analysis as follows: 

(1) A Doall loop has no dependencies 
between statements executed on different 
processors. Therefore, any shared-mem¬ 
ory access in such a loop can be cached. 
Caching is turned on. 

(2) A serial loop is executed by a single 
processor, and shared-memory accesses 
can be cached. Caching is turned on. 

(3) Doacross or recurrence loops do 
have cross-iteration dependencies. There¬ 
fore, conditions for incoherence can be 
true. Caching is turned off. 

(4) An Invalidate instruction is exe¬ 
cuted by each processor entering a Doall or 
a Doacross. The processor continuing 
execution after a Doall also executes an 
Invalidate instruction. In terms of a task 
graph, these points are equivalent to task- 
level boundaries. 

Consider the program example in Figure 
2. At the beginning, every processor exe¬ 
cutes the Cache-On instruction. Note that 
cache management instructions inserted in 
parallel loops are executed once by every 
participating processor, not on every itera¬ 
tion of such a loop. 

The algorithm is presented in Veiden- 
baum. 6 The correctness of the algorithm is 
proven by showing that the conditions 
necessary for an incoherence to occur are 


Cacheon 
Doall i = l,n 
Invalidate 

Y(i)= . 

= W(i)... Y(i) 

= ... X(i) 

enddo 

Invalidate 


Doall j = l,n 

Invalidate 

= W(j>... Y(j) 
XG) =... 

enddo 

Invalidate 


Doall k = l,n 
Invalidate 

= W(k) 

= ... X(k) 

= ... Y(k) 

enddo 

Invalidate 

Doserial i = 1, n 

= ... X(i) 

= - X(f(i)) 


Figure 2. Program example. 


not satisfied in programs processed by the 
algorithm. 

This algorithm preserves all temporal 
locality at each task level. It satisfies the 
three requirements set out for a scalable 
coherence scheme. 

Improving the cache management al¬ 
gorithm. In this section, we describe di¬ 
rections for possible extensions of the 
cache management algorithm, such as al¬ 
lowing caching to be used in some Doac¬ 
ross loops and reducing the number of 
cache invalidations by doing a more de¬ 
tailed analysis. 
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Doall i = 1, n 

invalidate /* an invalidate- 
cache to reset the change bits */ 
Y(i)=. 

= W(i)... Y(i) /* cache-reads */ 
= ... X(i) /* cache-read */ 
enddo 

invalidate /* an invalidate-cache 
to reset the change bits */ 


Doall j = 1, n 

invalidate /* an invalidate- 
cache to reset the change bits */ 

= W(j)... Y(j) /* cache-reads */ 
X(j) = ... 

enddo 

invalidate /* an invalidate-cache 
to reset the change bits */ 


Doall k = 1, n 

invalidate /* an invalidate- 
cache to reset the change bits */ 
= W(k) /* cache-read */ 

= ... X(k) /* memory-read */ 

= ... Y(k) /* cache-read */ 

enddo 

invalidate /* an invalidate-cache 
to reset the change bits */ 
Doserial i = 1, n 

= ... X(i) /* memory-read */ 

= ... X(f(i)) /* memory-read */ 

enddo 


Figure 3. Program example. 


A Doacross loop is executed by assign¬ 
ing successive iterations to different pro¬ 
cessors (modulo the number of processors 
available). The cross-iteration dependen¬ 
cies that exist in a Doacross are thus be¬ 
tween statements executed by different 
processors. Synchronization primitives 
have to be used between these processors 
to ensure that dependencies are satisfied, 
for example, the classical P and V primi¬ 


tives. A straightforward solution is to issue 
an Invalidate instruction after the P by each 
processor executing a statement depend¬ 
ing on a statement executed by another 
processor. Since the shared memory has 
the current value after the V instruction 
and the cache does not have anything, the 
value will be fetched out of global mem¬ 
ory. Otherwise, the Doacross loops can be 
treated the same way as the Doall loops by 
the cache management algorithm. The 
most interesting case of Doacross is one 
with other loops nested in it. In such a case, 
the inner loops take full advantage of 
caches. Invalidation has to be done after 
such loops, anyway. 

The simplified algorithm we presented 
does not directly check the conditions 
necessary for incoherence. It may be pos¬ 
sible to detect that the necessary condi¬ 
tions are not satisfied for any reference in 
a loop. In such a case, invalidation does not 
have to be performed in or after the loop. 
Other cases are a loop such that all the data 
used in the loop has been invalidated by an 
earlier Invalidate instruction or one where 
all the data written is invalidated by a later 
Invalidate before being used. 

The fast selective 
invalidation scheme 

The simple scheme we discussed above 
is not selective in enforcing coherence. It is 
not selective in either analysis or hardware 
used to identify and invalidate only stale 
cache copies. An obvious drawback is that 
valid cache lines cannot stay in cache 
across task-level boundaries; therefore, 
temporal locality is limited. We now try to 
improve performance by considering indi¬ 
vidual references in the analysis phase. We 
also introduce special hardware to enforce 
coherence on a reference by reference 
basis. 

The fast selective invalidation scheme 7 
chooses to enforce coherence at the point 
of a read reference (load). The idea is to 
make sure that every read reference will 
deliver only nonstale cache data to the 
processor; otherwise, the up-to-date copy 
from the global memory will be used. 

Every read reference (load) to shared 
memory in a program will be classified and 
marked by the compiler as either memory- 
read or cache-read. Each load by a proces¬ 
sor is tagged according to the compiler 
marking. Read references are marked 
cache-read if the cache resident copy is 
guaranteed up to date. Read references will 
be marked as a memory-read if the cache 


resident copy might have become stale. 

A processor will generate different types 
of memory operand fetches at runtime 
according to the classification. A cache 
controller treats a cache-read as a read in a 
conventional uniprocessor cache. A mem¬ 
ory-read implies reading a potentially stale 
copy, therefore an up-to-date copy will be 
loaded. 

Consider the example in Figure 3. Re¬ 
call that all processors executing a subrou¬ 
tine start with an empty cache. Read refer¬ 
ences will be marked as follows: Read 
accesses to the data elements of W are 
marked cache-reads because W is read¬ 
only. Accesses to Y are also cache-reads 
because the writes to T do not have existing 
copies of data elements to turn stale. Ac¬ 
cesses to X before the write in the second 
loop are cache-reads because they precede 
all writes to the X data elements. Accesses 
in the third and the fourth loop are mem¬ 
ory-reads because they might access words 
loaded in the first loop but turned stale by 
the write in the second loop. 

If the compiler can mark only the stale 
accesses as memory-read, a simple ap¬ 
proach would treat the memory-read as a 
default miss and use the global memory 
copy. However, the marking is done for 
each individual read reference with respect 
to writes by other processors. It might 
happen that two references marked mem¬ 
ory-read are accessing the same location in 
the same task. In this case, the second 
reference can use the data in the cache. For 
example, in the last loop of Figure 3 ,X(f(i)) 
may access data brought into the cache by 
the reference to X(i). 

To avoid the unnecessary memory ac¬ 
cess for the later reference, the following 
special hardware is introduced: A status bit 
called the change bit is added to each cache 
word. The bit is set when a line is loaded 
into the cache and reset at task-level 
boundaries. An access marked memory- 
read will first check if both the valid and 
the change bits are true. This will indicate 
a hit. Otherwise a miss occurs. An Invali¬ 
date instruction resets the change bits of all 
lines. It is inserted in the same places as in 
our first coherence scheme. 

Cache operation. Valid and change bits 
are associated with each cache word. The 
change bits are reset at each task-level 
boundary by the processor starting a task. 
The change bit is set when a word is written 
into the cache on a read miss or a write. 

A memory-read to a cache word with a 
false change bit is a default miss, but it will 
be treated as a conventional cache access 
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with a true change bit. Therefore, the status 
of the bit can distinguish the first memory- 
read or the memory-reads following a write 
from other accesses to the same line in the 
same task. 

An Invalidate instruction resets the 
change bits, and it will be inserted in the 
program by the same algorithm as in the 
simple invalidation scheme of the last 
section. 

Used in a traditional sense, the valid bit 
implies that nothing has been loaded/ 
stored in the cache line and will cause a 
default miss. A load or store operation sets 
the valid bit of an individual cache word. 
The processor can issue a Clear-Cache 
instruction to reset all the valid bits and an 
Invalidate instruction to reset all the 
change bits using resettable SRAMs. 

A cache hit is a function of four Boolean 
variables: 

(1) matched (true on address tag match 
and false otherwise), 

(2) cacheread (true if an access is a 
cache-read and false for memory- 
read), 

(3) change (true for a reset change bit 
and false for reset change), 

(4) valid (true for a set valid bit and 
false for a reset valid bit). 

Given these variables, a cache hit is repre¬ 
sented by 

Hit = matched a val id a 

{cacheread v {cacheread a change)) 

Reference marking scheme. We rely 
on a parallel Fortran compiler such as 
Parafrase 5 to insert the Invalidate instruc¬ 
tion at appropriate places in the instruction 
stream and to identify and mark references 
as cache-read or memory-read. We discuss 
the reference marking scheme below (a 
more detailed discussion can be found in 
Cheong and Veidenbaum 8 ). 

The marking of read references is based 
on the order of the read-write accesses and 
the task-level boundaries. Flow analysis 8 
is used to carry out such marking 

The marking algorithm can be summa¬ 
rized as follows: References to read-only 
variables within a subroutine are marked 
cache-read. For variables that are both read 
and written within the subroutine, all refer¬ 
ences to a variable preceding the first write 
to that variable are marked cache-read. 
The remaining read references are marked 
according to the following rule: In a paral¬ 
lel execution graph, for each task level L. 
that contains a write to a data element, if an 


access to the data element exists in i < j 
levels, all read references to the data ele¬ 
ment in task levels k >j should be marked 
memory-read. The rest of the read refer¬ 
ences should be marked cache-read. 

The above rule is based on the same 
necessary conditions as defined in the sec¬ 
tion entitled “A simple invalidation ap¬ 
proach.” The compiler algorithm to mark 
the references depends on flow analysis to 
detect the necessary conditions. The analy¬ 
sis detects the order of individual read and 
write references with respect to task-level 
boundaries that represent the entry and exit 
points of parallel loops in the flow graph. 
References to array variables are analyzed 
by name only, that is, without considering 
the subscripts, because it is impossible to 
analyze accesses to individual array ele¬ 
ments in the general case across task-level 
boundaries. 

Summary. The fast selective invalida¬ 
tion scheme is selective by doing the analy¬ 
sis for individual references and by using 
hardware that can invalidate individual 
lines. Selective invalidation covers both 
read-only (W in the example) and read- 
write Y and X) variables that are accessed 
by cache-reads. It preserves more temporal 
locality than the simple invalidation ap¬ 
proach that invalidates at least all cache 
copies of read-write variables at each task 
boundary. 

It is fast because, instead of sequentially 
invalidating each cache line, it accom¬ 
plishes invalidation by resetting the change 
bit. Using resettable SRAMs, one Invali¬ 
date instruction can reset the change bits of 
all cache lines. Therefore, the time cost to 
invalidate stale copies is negligible. The 
actual invalidation occurs when a word is 
accessed by a miss induced by the state of 
the change and valid bits. It incurs no time 
penalty, as compared to explicitly issuing 
an instruction to invalidate a cache line or 
even a page. Other methods aimed at selec¬ 
tive invalidation either do not achieve the 
same level of selective invalidation 9 or 
require sequential invalidation 10 (for a 
detailed discussion, see Cheong and Vei¬ 
denbaum 8 ). 

Overall, however, this scheme is still not 
selective enough. Relying on compile-time 
detection alone, the scheme is forced to be 
conservative. Even though temporal local¬ 
ity exists across task levels, it cannot be 
exploited by memory-read references. 
More selective invalidation methods, and 
hence better temporal locality, are the tar¬ 
get of the next coherence maintenance 
scheme. 


The version control 
scheme 

The goal of preserving temporal locality 
across task-level boundaries is only satis¬ 
fied for references marked cache-read in 
the fast selective scheme. Once a reference 
is marked as a memory-read, all references 
to the same variable on successive task 
levels also have to be marked memory- 
read. The first access to such variables in 
each task on a new level causes a default 
miss. To prevent the loss of this type of 
temporal locality, the version control 
scheme is used. 

General ideas. The order of writes to a 
variable (or memory address) from differ¬ 
ent tasks is completely determined by the 
task-execution graph, even if it cannot be 
determined at compile time. Only one task 
at a time can write the variable. The writes 
to a variable in one task theoretically pro¬ 
duce a different version of the variable 
than the writes in other tasks. 

Multiple writes to a variable within a 
task produce only one version because only 
the value of the last write to a variable will 
ever be read by other tasks — and only by 
tasks at subsequent task levels. Thus, at the 
end of a task execution, only one new 
version is produced for each variable writ- 
ten'within the task. 

A version of a variable produced in a 
task is the new version of the variable to be 
used at subsequent levels. It can be used by 
the task that generated the new version, but 
no other task at this level can use this 
version of the variable. After the processor 
finishes the task and moves to a subsequent 
level, the new version becomes the current 
version. The current version of the variable 
contains the up-to-date value to be used 
until generation of the next version. Each 
cache copy of a variable in the system must 
belong to a particular version. 

For the scheme to be practical, an array 
is considered a single variable. Even 
though a task may write to only a part of an 
array, a new version is nevertheless as¬ 
signed for the entire array. If multiple tasks 
on the same task level write to an array, the 
writes altogether produce only one new 
version. A new version of an array variable 
can still be used in every task at the level 
where the variable is written because tasks 
are guaranteed to use disjoint subsets of the 
array elements. 

A processor maintains an integer called 
the current version number for each vari¬ 
able (scalar or array). The value of the 
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Doall i = 1, n 
Y(i)= . 

= W(i)... Y(i) 

= ... X(i) 

Increment CVN for Y, ... 


Increment CVN for... 

Doall j = 1, n 

= W(j)... Y(j) 

X(j) = ... 

enddo 

Increment CVN for X... 


Increment CVN for... 

Doall k = 1, n 
= W(k) 

= ... X(k) 

= ... Y(k) 

enddo 

Increment CVN for... 

Doserial i = 1, n 

= ... X(i) 

= ... X(f(i)) 
enddo 


Figure 4. Program example. 


CVN of a variable represents the version of 
the variable that the processor must use. 
Each processor updates its own CVNs 
independently of all other processors using 
compiler-generated instructions. The gen¬ 
eral idea is to increment CVNs of all vari¬ 
ables written at a task level when the pro¬ 
cessor moves to the next level. CVNs are 
kept in a separate local memory. Since 
each array needs only one CVN, the local 
memory is small. 

Each cache line contains an additional 
tag called a birth version number. The 
BVN of the cache line represents a particu¬ 
lar version to which the cache copy be¬ 
longs. The BVN tag is loaded with the 


value of CVN on a read miss or CVN+1 on 

The version scheme performs the fol¬ 
lowing tasks: 

(1) runtime comparison of the BVN of a 
cache line and the CVN of the variable to 
determine if the access is a hit, 

(2) the tagging of each cache line with a 
BVN, and 

(3) proper maintenance of the CVNs. 

Given adequate hardware support, these 
tasks can be achieved with minimal time 
cost to the system. 

Cache management with version 
numbers. At each reference, hardware 
compares the CVN and the BVN to avoid 
stale cache copies. When a cache line is 
loaded from the global memory, the corre¬ 
sponding CVN of the variable is copied 
into the BVN field of the cache line. When 
a cache line is written, the BVN field of the 
cache line will be set to the new version 
number of the variable, that is, CVN+1. 
The BVN of the cache line is checked 
against the CVN of the variable when the 
copy is read. 

A cache line with a BVN less than the 
CVN of the variable is a stale cache line. 
When this is detected, a cache miss will be 
generated and the up-to-date value will be 
loaded from the global memory. Cache 
lines the processor writes will have their 
BVN equal to the new version number. On 
a subsequent task level, such cache lines 
will be identified by the equality of the 
CVN and the BVN. At the current level, 
the lines have BVN > CVN. 

CVNs of all variables written on a task 
level are incremented by a processor when 
it moves to the next task level. The proces¬ 
sor performs this by accessing the CVNs 
from its local memory. The updates of the 
CVNs in each processor are done inde¬ 
pendently, without communication over¬ 
head and with little computational over¬ 
head. 

The CVNs of the same variable kept by 
different processors do not have to agree. 
The fact that the BVN of the copy is less 
than the CVN of the variable, not the exact 
difference of the two numbers, is sufficient 
for maintaining cache coherence. 

Version update. In this section, we de¬ 
scribe the version update of a variable in 
terms of what is done at compile time and 
what is done at runtime. For simplicity, let 
us assume an acyclic task-execution graph 
(the general case is discussed in Cheong 


and Veidenbaum"). A set of variables Var 
that the tasks at level L. can write to is 
computed at compile time and used to 
update the CVNs of these variables at 
runtime. When a processor finishes a task 
at level i and is ready to execute a task at 
level i + k, it needs to increment the CVN 
of each variable that could have been 
modified on level i and the levels that the 
processor skips, that is, 

j = k- 1 
U Var i+J 
}= 0 

If a variable is written by another processor 
at any of the levels skipped, the CVN of the 
variable will always be larger than the 
BVN of the corresponding cache line of 
this processor. 

There are two ways of dealing with level 
skipping. One is to allow a processor to 
skip levels and then apply updates defined 
by the compiler for each of the skipped 
levels separately. This requires the proces¬ 
sor to calculate or to be notified of the 
number of levels skipped. Another way is 
to disallow level skipping and require each 
processor to at least update the CVNs upon 
crossing each level boundary. 

The same program example as in previ¬ 
ous schemes illustrates version update 
operations (see Figure 4). If variable Y is 
updated in the first loop andX in the second 
loop, their CVNs will be increased in every 
processor at the end of the loops, respec¬ 
tively. As can be seen in Figure 4, the 
version control scheme preserves more 
temporal locality than previous schemes. 
For variable X, if a processor writes to 
some X(J )s in the second loop, it will refer¬ 
ence these copies in its own cache as cache 
hits in the third loop and the fourth loop. 
Had the fast selective invalidation scheme 
been used, first accesses to these copies in 
these loops would have been default 
misses. 

Hardware support. The version control 
coherence scheme requires the following 
hardware support to reduce the overhead: 

(1) A version manager to maintain the 
CVNs of each variable in a fast local 
memory. A CVN is addressed by an iden¬ 
tity (ID) number assigned to each variable 
at either compile time or link time. The 
version manager can execute two instruc¬ 
tions — Increment for a given CVN or 
Reset for all CVNs — issued by its proces¬ 
sor. 

(2) The identity number of a variable. 
This is issued by the processor with every 
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memory reference. The ID field for each 
variable can be part of the address of the 
memory reference, such as a segment 
number in a segmented memory system, or 
it can be an extension of the address. In the 
latter case, 16 bits will be sufficient for 
most programs. 

(3) A field in each cache tag that con¬ 
tains the B VN. All BVNs can be reset by a 
processor instruction. 

Before a program execution starts, the 
CVNsaresetto 1 and the BVN field in each 
cache line is set to 0. Given a finite size of 
CVN and BVN, an overflow might occur 
that would require a processor to reinitial¬ 
ize the BVN for all cache copies and the 
CVN of the variable. The larger the size of 
the numbers, the fewer such resets needed. 
However, too large a size increases the 
hardware cost. 

Figure 5 illustrates a simplified view of 
the hardware block diagram. In parallel 
with the cache read operation, the ID 
number in the address is used to retrieve 
the CVN from the version manager’s 
memory. The retrieved CVN is compared 
with the BVN of the cache line. The com¬ 
parison of the CVNs and the BVN is car¬ 
ried out in parallel with the tag comparison 
of the cache access. Also, loading an up-to- 
date copy from the global memory and 
loading the correct CVN into the version 
field of the cache line can be done in 
parallel. 

When a missed cache line is brought into 
the cache, its BVN is set to the CVN of the 
variable. Hence, the correct version num¬ 
ber will be written to the BVN field of the 
up-to-date cache line read from the global 
memory. 

A write operation will update the cache 
line and update the BVN field of the cache 
line with the CVN+1. The suboperations 
associated with a cache write can be car¬ 
ried out in parallel. 

Extension to multilevel caches. The 

version control coherence scheme can be 
extended to systems with multilevel cache 
memories. For multilevel caches, hard¬ 
ware schemes rely on protocols in which 
invalidation signals traverse levels of 
caches. To reduce such global traffic, an 
additional restriction called the “inclusion 
property” is imposed such that an ancestral 
cache knows if a copy of its line is present 
in its descendent caches (the ones closer to 
processors). The inclusion property also 
requires a line to be present in an ancestral 
cache for it to be present in a descendent 
cache. The version control scheme has the 


advantage that the inclusion property does 
not need to be maintained, since no invali¬ 
dation signals need to be sent among 
caches. 

The version control scheme is extend¬ 
ible to multilevel caches by implementing 
version tags (BVNs) in all the cache 
memories. The same criteria to determine 
hits or misses as in the single-level version 
control scheme apply to caches at all lev¬ 
els. The multilevel case only differs from 
the single-level case in that the CVNs of 
the same variable must be consistent 
among all version managers. Otherwise, a 
copy shared by two processors might be 
considered up to date by one processor 
with a smaller CVN and treated as stale by 
another with a larger CVN. Uniform CVNs 
can be obtained by requiring each proces¬ 
sor to perform updates of CVNs without 
skipping levels. 

The same mechanism that guarantees 
that values written on a task level are de¬ 
posited to the shared memory before cross¬ 
ing the task-level boundary now also guar¬ 
antees depositing of the values in all caches 
on a path from a processor to memory. This 
assures that an up-to-date value is read by 
a task from an intermediate-level cache 
after crossing the task-level boundary. 

Summary. A reference that would be 
marked memory-read under the fast selec¬ 
tive scheme and would force a default miss 


right after a task-level boundary can be a 
cache hit in the version control scheme. 
More intertask temporal locality is pre¬ 
served. 

Using version numbers, a processor can 
distinguish the up-to-date cache lines writ¬ 
ten or loaded by the processor itself from 
lines possibly modified by other proces¬ 
sors. Temporal locality is preserved across 
task-level boundaries. Neither the simple 
invalidation scheme nor the fast selective 
invalidation scheme can preserve this tem¬ 
poral locality. As described earlier, the 
simple invalidation scheme preserves 
temporal locality for shared variables only 
within a task level. 

As for the fast selective scheme, tempo¬ 
ral locality across task levels is preserved 
for all variables accessed by references 
marked cache-read but not by accesses 
tagged memory-read. The version control 
scheme does not change the version num¬ 
ber of a shared variable until a task level is 
reached on which it is written. Between 
levels where the variable is written, the 
corresponding cache lines are not invali¬ 
dated once loaded into a cache. 

Each processor comrtiunicates with its 
version manager, but this does not add to 
interprocessor traffic. No communication 
between processors is required. The num¬ 
ber of version update operations is rela¬ 
tively small because one update per pro¬ 
cessor is needed for a variable written in a 
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task level regardless of the number of 
writes in the task level and the number of 
data elements of the variables. Min and 
Baer 12 independently proposed a similar 
idea. 

Comparison of 
schemes 

To compare the performance of the three 
schemes, trace-driven simulations were 


conducted on numerical benchmark rou¬ 
tines listed in Table 1. The architecture 
simulated consisted of processors with 
private caches connected to interleaved 
shared-memory systems through intercon¬ 
nection networks. Split instruction/data 
caches are used with a data cache size of 
8,192 words. Coherence block size of one 
word is still used, but the size of a transfer 
block is varied. Each processor has an 
unlimited number of registers. Two system 
sizes, 32 and 512 processors, are simu¬ 


lated. Only three of the seven benchmark 
routines used show significant speedup 
improvement going from the 32- to the 
512-processor system. 

Benchmark routines (Table 1) are paral¬ 
lelized by Parafrase. The parallelized For¬ 
tran routines were interpreted to extract the 
shared-memory traces. The resulting 
traces are simulated by a cache simulator. 

The metric used for comparison is the 
shared data hit ratio. As mentioned in the 
“Assumptions” section, instructions and 
private data do not cause coherence prob¬ 
lems and can be cached as in a uniproces¬ 
sor cache. In our simulations, coherence 
enforcement is not applied to instructions 
and private data. 

Each of the schemes invalidates some 
up-to-date cache lines. To determine the 
amount of unnecessary invalidation, we 
obtained the hit ratio for an ideal coherence 
scheme. In this scheme, each processor 
knows exactly which of its cache lines 
have become stale at the end of each task 
level due to writes by other processors. 

Tables 2 and 3 show the simulated ratios 
for the benchmark routines in Table 1. The 
column subheadings SI, FS, VC, and IR 
represent the simple invalidation scheme, 
the fast selective scheme, the version con- 


Table 1. Benchmark programs. 


Subroutines 

No. Name 

Program Name and Description 

1 

Newrz 

Simp2: New velocity computation and volume change 
in a Lagrangian hydrodynamics program. 

2 

Ux 

Vortex: A PDE solver. 

3 

Cg 

A conjugate gradient matrix solver (for Ax=b). 

4 

Cmslow 

Baro: Nonlinear tendency computation in a barometer 
program. 

5 

Lblkern 

A kernel of experimental physics computation. 

6 

Step 

Arc3D: Computational fluid dynamics. 

7 

Parmvr 

Pic: Particle in cell program. 


Table 2. Simulated data hit ratios for shared data in a 32-processor system. 



Table 3. Simulated data hit ratios for shared data in a 512-processor system. 


Sub. _ Block size of 1 _ _ Block size of 2 _ _ Block size of 4 _ 

No. SI FS VC IR_SI FS VC IR SI FS VC IR 


1 15.84 34.62 40.59 40.59 37.62 52.88 55.45 55.45 43.56 59.62 62.38 62.38 

2 32.45 49.05 49.14 49.46 48.99 65.53 65.63 65.96 57.10 73.64 73.75 74.06 

6 28.57 36.24 55.45 73.77 51.01 56.44 69.99 86.11 63.20 66.88 78.28 92.54 


(Sub.: Subroutine; SI: simple invalidation scheme; FS: fast selective invalidation scheme; VC: version control scheme; IR: ideal data hit ratio) 
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trol scheme, and the ideal data hit ratio, 
respectively, for different transfer block 
sizes of one, two, and four words. The 
average differences (over the number of 
routines in the table) of hit ratios obtained 
with each scheme and the ideal hit ratios 
are 13.26 percent, 5.19 percent, and 1.57 
percent, respectively, for the simple invali¬ 
dation scheme, the fast selective invalida¬ 
tion scheme, and the version control 
scheme on 32 processors. The average 
differences for the system with 512 proces¬ 
sors are 21.7 percent, 10.61 percent, and 
4.65 percent. 

The simple invalidation scheme is ca¬ 
pable of preserving temporal locality only 
within each task level. Depending on the 
program structure and granularity, the 
simple invalidation scheme delivered a 
wide range of data cache hit ratios. The fast 
selective invalidation approach can pre¬ 
serve some temporal locality across task 
levels, especially for variables for which 
the access pattern is dominated by a se¬ 
quence of reads followed by writes. For 
this reason, the fast selective scheme can 
improve upon some of the benchmarks on 
which the simple invalidation scheme did 
not do well. 7 However, for variables whose 
access pattern is of alternating read and 
write accesses, the fast selective invalida¬ 
tion scheme cannot exploit most of the 
intertask-level temporal locality. The ver¬ 
sion control scheme provides by far the 
highest data hit ratio among all three 
schemes, 11 simply because most of the 
intertask-level temporal locality can be 
exploited by the version control mecha¬ 
nism. 


C ompiler-directed cache coherence 
strategies provide a viable alterna¬ 
tive for cache system design in 
large-scale multiprocessors. The com¬ 
piler-directed strategies expose multi¬ 
processor cache management to the com¬ 
piler and achieve cache coherence with 
independently managed caches and the 
hardware cost that grows very slowly as a 
function of the number of processors only. 
The most important advantage of indepen¬ 
dently managed caches is the elimination 
of interprocessor communication for co¬ 
herence maintenance. 

The three proposed schemes differ in the 
complexity of the hardware required. The 
schemes offer a range of cache perfor¬ 
mance at different costs. Detailed perfor¬ 
mance evaluation will be needed to select 
the most cost-effective scheme for a given 
system. ■ 
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I n a shared-memory multiprocessor, 
the memory system provides access 
to the data to be processed and mecha¬ 
nisms for interprocess communication. 
The bandwidth of the memory system 
limits the speed of computation in current 
high-performance multiprocessors due to 
the uneven growth of processor and mem¬ 
ory speeds. Caches are fast local memories 
that moderate a multiprocessor’s memory- 
bandwidth demands by holding copies of 
recently used data, and provide a low- 
latency access path to the processor. Be¬ 
cause of locality in the memory access 
patterns of multiprocessors, the cache sat¬ 
isfies a large fraction of the processor 
accesses, thereby reducing both the aver¬ 
age memory latency and the communica¬ 
tion bandwidth requirements imposed on 
the system’s interconnection network. 

Caches in a multiprocessing environ¬ 
ment introduce the cache-coherence prob¬ 
lem. When multiple processors maintain 
locally cached copies of a unique shared 
memory location, any local modification 
of the location can result in a globally 
inconsistent view of memory. Cache-co¬ 
herence schemes prevent this problem by 


This article addresses 
the usefulness of 
shared-data caches in 
large-scale 
multiprocessors, the 
relative merits of 
different coherence 
schemes, and system- 
level methods for 
improving directory 
efficiency. 


maintaining a uniform state for each 
cached block of data. 

Several of today’s commercially avail¬ 
able multiprocessors use bus-based mem¬ 
ory systems. A bus is a convenient device 
for ensuring cache coherence because it 
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allows all processors in the system to ob¬ 
serve ongoing memory transactions. If a 
bus transaction threatens the consistent 
state of a locally cached object, the cache 
controller can take such appropriate action 
as invalidating the local copy. Protocols 
that use this mechanism to ensure coher¬ 
ence are called snoopy protocols because 
each cache snoops on the transactions of 
other caches. 1 

Unfortunately, buses simply don’t have 
the bandwidth to support a large number of 
processors. Bus cycle times are restricted 
by signal transmission times in multidrop 
environments and must be long enough to 
allow the bus to “ring out,” typically a few 
signal propagation delays over the length 
of the bus. As processor speeds increase, 
the relative disparity between bus and 
processor clocks will simply become more 
evident. 

Consequently, scalable multiprocessor 
systems interconnect processors using 
short point-to-point wires in direct or 
multistage networks. Communication 
along impedance-matched transmission 
line channels can occur at high speeds, 
providing communication bandwidth that 
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scales with the number of processors. 
Unlike buses, the bandwidth of these net¬ 
works increases as more processors are 
added to the system. Unfortunately, such 
networks don’t have a convenient snoop¬ 
ing mechanism and don’t provide an effi¬ 
cient broadcast capability. 

In the absence of a systemwide broad¬ 
cast mechanism, the cache-coherence 
problem can be solved with interconnec¬ 
tion networks using some variant of direc¬ 
tory schemes. 2 This article reviews and 
analyzes this class of cache-coherence 
protocols. We use a hybrid of trace-driven 
simulation and analytical methods to 
evaluate the performance of these schemes 
for several parallel applications. 

The research presented in this article is 
part of our effort to build a high-perfor¬ 
mance large-scale multiprocessor. To that 
end, we are studying entire multiprocessor 
systems, including parallel algorithms, 
compilers, runtime systems, processors, 
caches, shared memory, and interconnec¬ 
tion networks. We find that the best solu¬ 
tions to the cache-coherence problem re¬ 
sult from a synergy between a multiproces¬ 
sor’s software and hardware components. 

Classification of 
directory schemes 

A cache-coherence protocol consists of 
the set of possible states in the local caches, 
the states in the shared memory, and the 
state transitions caused by the messages 
transported through the interconnection 
network to keep memory coherent. To 
simplify the protocol and the analysis, our 
data block size is the same for coherence 
and cache fetch. 

A cache-coherence protocol that does 
not use broadcasts must store the locations 
of all cached copies of each block of shared 
data. This list of cached locations, whether 
centralized or distributed, is called a direc¬ 
tory. A directory entry for each block of 
data contains a number of pointers to spec¬ 
ify the locations of copies of the block. 
Each directory entry also contains a dirty 
bit to specify whether or not a unique cache 
has permission to write the associated 
block of data. 

The different flavors of directory proto¬ 
cols fall under three primary categories: 
full-map directories, limited directories, 
and chained directories. Full-map directo¬ 
ries 2 store enough state associated with 
each block in global memory so that every 
cache in the system can simultaneously 


store a copy of any block of data. That is, 
each directory entry contains N pointers, 
where N is the number of processors in the 
system. Such directories can be optimized 
to use a single bit pointer. Limited directo¬ 
ries 3 differ from full-map directories in 
that they have a fixed number of pointers 
per entry, regardless of the number of 
processors in the system. Chained directo¬ 
ries 4 emulate the full-map schemes by 
distributing the directory among the 
caches. 

To analyze these directory schemes, we 
chose at least one protocol from each cate¬ 
gory. In each case, we tried to pick the 
protocol that was the least complex to 
implement in terms of the required hard¬ 
ware overhead. Our method for simplify¬ 
ing a protocol was to minimize the number 
of cache states, memory states, and types 
of protocol messages. All of our protocols 
guarantee sequential consistency , which 
Lamport 5 defined to ensure the correct exe¬ 
cution of multiprocess programs. 

Full-map directories. The full-map 
protocol uses directory entries with one bit 
per processor and a dirty bit. Each bit 
represents the status of the block in the 
corresponding processor’s cache (present 
or absent). If the dirty bit is set, then one 
and only one processor’s bit is set, and that 
processor has permission to write into the 
block. A cache maintains two bits of state 
per block. One bit indicates whether a 
block is valid; the other bit indicates 
whether a valid block may be written. The 
cache-coherence protocol must keep the 
state bits in the memory directory and those 
in the caches consistent. 

Figure la illustrates three different 
states of a full-map directory. In the first 
state, location X is missing in all of the 
caches in the system. The second state 
results from three caches (Cl, C2, and C3.) 
requesting copies of location X. Three 
pointers (processor bits) are set in the entry 
to indicate the caches that have copies of 
the block of data. In the first two states, the 
dirty bit — on the left side of the directory 
entry — is set to clean (C), indicating that 
no processor has permission to write to the 
block of data. The third state results from 
cache C3 requesting write permission for 
the block. In this final state, the dirty bit is 
set to dirty (D), and there is a single pointer 
to the block of data in cache C3. 

It is worth examining the transition from 
the second state to the third state in more 
detail. Once processor P3 issues the write 
to cache C3, the following events tran¬ 
spire: 


(1) Cache C3 detects that the block 
containing location X is valid but that the 
processor does not have permission to 
write to the block, indicated by the block’s 
write-permission bit in the cache. 

(2) Cache C3 issues a write request to 
the memory module containing location X 
and stalls processor P3. 

(3) The memory module issues invali¬ 
date requests to caches Cl and C2. 

(4) Cache Cl and cache C2 receive the 
invalidate requests, set the appropriate bit 
to indicate that the block containing loca¬ 
tion X is invalid, and send acknowledg¬ 
ments back to the memory module. 

(5) The memory module receives the 
acknowledgments, sets the dirty bit, clears 
the pointers to caches C1 and C2, and sends 
write permission to cache C3. 

(6) Cache C3 receives the write permis¬ 
sion message, updates the state in the 
cache, and reactivates processor P3. 

Note that the memory module waits to 
receive the acknowledgments before al¬ 
lowing processor P3 to complete its write 
transaction. By waiting for acknowledg¬ 
ments, the protocol guarantees that the 
memory system ensures sequential consis¬ 
tency. 

The full-map protocol provides a useful 
upper bound for the performance of cen¬ 
tralized directory-based cache coherence. 
However, it is not scalable with respect to 
memory overhead. Assume that the 
amount of distributed shared memory in¬ 
creases linearly with the number of 
processors N. Because the size of the direc¬ 
tory entry associated with each block of 
memory is proportional to the number of 
processors, the memory consumed by the 
directory is proportional to the size of 
memory (©(AO) multiplied by the size of 
the directory entry (0(A)). Thus, the total 
memory overhead scales as the square of 
the number of processors (©(A 2 )). 

Limited directories. Limited directory 
protocols are designed to solve the direc¬ 
tory size problem. Restricting the number 
of simultaneously cached copies of any 
particular block of data limits the growth 
of the directory to a constant factor. For our 
analysis, we selected the limited directory 
protocol proposed in Agarwal et al. 3 

A directory protocol can be classified as 
DirX using the notation from Agarwal et 
al. 3 The symbol i stands for the number of 
pointers, and X is NB for a scheme with no 
broadcast and B for one with broadcast. A 
full-map scheme without broadcast is rep¬ 
resented as Dir^NB. A limited directory 
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Figure 1. Three types of directory protocols: (a) three states of a full-map direc¬ 
tory; (b) eviction in a limited directory; and (c) chained directory. 


protocol that uses i<N pointers is denoted 
Dir.NB. The limited directory protocol is 
similar to the full-map directory, except in 
the case when more than i caches request 
read copies of a particular block of data. 

Figure 1 b shows the situation when three 
caches request read copies in a memory 
system with a Dir 2 NB protocol. In this 
case, we can view the two-pointer direc¬ 
tory as a two-way set-associative cache of 
pointers to shared copies. When cache C3 
requests a copy of location X, the memory 
module must invalidate the copy in either 
cache Cl or cache C2. This process of 
pointer replacement is sometimes called 
eviction. Since the directory acts as a set- 
associative cache, it must have a pointer 
replacement policy. Our protocol uses an 
easily implemented pseudorandom evic¬ 
tion policy that requires no extra memory 
overhead. In Figure lb, the pointer to cache 
C3 replaces the pointer to cache C2. 

Why might limited directories succeed? 
If the multiprocessor exhibits processor 
locality in the sense that in any given inter¬ 
val of time only a small subset of all the 
processors access a given memory word, 
then a limited directory is sufficient to 
capture this small “worker-set” of proces- 

Directory pointers in a Dir.NB protocol 
encode binary processor identifiers, so 
each pointer requires log 2 lV bits of mem¬ 
ory, where N is the number of processors in 
the system. Given the same assumptions as 
for the full-map protocol, the memory 
overhead of limited directory schemes 
grows as 0(MoglV). These protocols are 
considered scalable with respect to mem¬ 
ory overhead because the resources re¬ 
quired to implement them grow approxi¬ 
mately linearly with the number of proces¬ 
sors in the system. 

Dir B protocols allow more than i copies 
of each block of data to exist, but they 
resort to a broadcast mechanism when 
more than i cached copies of a block need 
to be invalidated. However, interconnec¬ 
tion networks with point-to-point wires do 
not provide an efficient systemwide broad¬ 
cast capability. In such networks, it is also 
difficult to determine the completion of a 
broadcast to ensure sequential consis¬ 
tency. While it is possible to limit some 
Dir B broadcasts to a subset of the system 
(see Agarwal et al. 3 ), we restrict our evalu¬ 
ation of limited directories to the Dir.NB 
protocols. 

Chained directories. Chained directo¬ 
ries, the third option for cache-coherence 
schemes that do not utilize a broadcast 


mechanism, realize the scalability of lim¬ 
ited directories without restricting the 
number of shared copies of data blocks. 4 
This type of cache-coherence scheme is 
called a chained scheme because it keeps 
track of shared copies of data by maintain¬ 
ing a chain of directory pointers. We inves¬ 
tigated two chained directory schemes. 

The simpler of the two schemes imple¬ 
ments a singly linked chain, which is best 
described by example (see Figure lc). 
Suppose there are no shared copies of loca¬ 
tion X. If processor PI reads location X, 
the memory sends a copy to cache Cl, 
along with a chain termination (CT) 
pointer. The memory also keeps a pointer 


to cache Cl. Subsequently, when proces¬ 
sor P2 reads location X, the memory sends 
a copy to cache C2, along with the pointer 
to cache Cl. The memory then keeps a 
pointer to cache C2. By repeating this step, 
all of the caches can cache a copy of loca¬ 
tion X. If processor P3 writes to location X, 
it is necessary to send a data invalidation 
message down the chain. To ensure se¬ 
quential consistency, the memory module 
denies processor P3 write permission until 
the processor with the chain termination 
pointer acknowledges the invalidation of 
the chain. Perhaps this scheme should be 
called a gossip protocol (as opposed to a 
snoopy protocol) because information is 
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Input: Parallel program Input: Address trace 

Output: Address trace Output: Average request rate, block 

size, and memory latency 


Input: Average request rate, block 
size, and memory latency 
Output: Processor utilization 


Figure 2. Diagram of methodology. 


passed from individual to individual, 
rather than being spread by covert observa¬ 
tion. 

The possibility of cache-block replace¬ 
ment complicates chained directory proto¬ 
cols. Suppose that cache C, through cache 
C N all have copies of location X and that 
location X and location Y map to the same 
(direct-mapped) cache line. If processor P. 
reads location Y, it must first evict location 
X from its cache. In this situation, two 
possibilities exist: 

(1) Send a message down the chain to 
cache C._, with a pointer to cache C +1 and 
splice C out of the chain, or 

(2) Invalidate location X in cache C. +1 
through cache C n . 

For our evaluation, we chose the second 
scheme because it can be implemented by 
a less complex protocol than the first. In 
either case, sequential consistency is main¬ 
tained by locking the memory location 
while invalidations are in progress. 

Another solution to the replacement 
problem is to use a doubly linked chain. 
This scheme maintains forward and back¬ 
ward chain pointers for each cached copy 
so that the protocol does not have to trav¬ 
erse the chain when there is a cache re¬ 
placement. The doubly linked directory 
optimizes the replacement condition at the 
cost of a larger average message block size 
(due to the transmission of extra directory 


pointers), twice the pointer memory in the 
caches, and a more complex coherence 
protocol. 

Although the chained protocols are more 
complex than the limited directory proto¬ 
cols, they are still scalable in terms of the 
amount of memory used for the directo¬ 
ries. The pointer sizes grow as the loga¬ 
rithm of the number of processors, and the 
number of pointers per cache or memory 
block is independent of the number of 
processors. 

Caching only private data. Up to this 
point, we have assumed that caches are 
allowed to store local copies of shared 
variables, thus leading to the cache-consis¬ 
tency problem. An alternative shared 
memory method avoids the cache-coher¬ 
ence problem by disallowing caching of 
shared data. In our analysis, we designate 
this scheme by saying itonly caches private 
data. This scheme caches private data, 
shared data that is read-only, and instruc¬ 
tions, .while references to modifiable 
shared data bypass the cache. In practice, 
shared variables must be statically identi¬ 
fied to use this scheme. 

Methodology 

What is a good performance metric for 
comparing the various cache-coherence 
schemes? To evaluate the performance of 


the memory system, which includes the 
cache, the memory, and the interconnec¬ 
tion network, we determine the contribu¬ 
tion of the memory system to the time 
needed to run a program on the system. Our 
analysis computes the processor utiliza¬ 
tion, or the fraction of time that each pro¬ 
cessor does useful work. One minus the 
utilization yields the fraction of processor 
cycles wasted due to memory system de¬ 
lays. The actual system speedup equals the 
number of processors multiplied by the 
processor utilization. This metric has been 
used in other studies of multiprocessor 
cache and network performance. 6 

In a multiprocessor, processor utiliza¬ 
tion (and therefore system speedup) is 
affected by the frequency of memory refer¬ 
ences and the latency of the memory sys¬ 
tem. The latency (T) of a message through 
the interconnection network depends on 
several factors, including the network 
topology and speed, the number of proces¬ 
sors in the system, the frequency and size 
of the messages, and the memory access 
latency. The cache-coherence protocol 
determines the request rate, message size, 
and memory latency. To compute proces¬ 
sor utilization, we need to use detailed 
models of cache-coherence protocols and 
interconnection networks. 

Figure 2 shows an overview of our an¬ 
alysis process. Multiprocessor address 
traces generated using three tracing meth¬ 
ods at Stanford University, IBM, and MIT 
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are run on a cache and directory simulator 
that counts the occurrences of different 
types of protocol transactions. A cost is 
assigned to each of these transaction types 
to compute the average processor request 
rate, the average network message block 
size, and the average memory latency per 
transaction. From these parameters, a 
model of a packet-switched, pipelined, 
multistage interconnection network calcu¬ 
lates the average processor utilization. 

Getting multiprocessor address trace 
data. The address traces represent a wide 
range of parallel algorithms written in 
three different programming languages. 
The programs traced at Stanford were 
written in C; at IBM, in Fortran; and at 
MIT, in Mul-T, 7 a variant of Multilisp. The 
implementation of the trace collector dif¬ 
fers for each of the programming environ¬ 
ments. Each tracing system can theoreti¬ 
cally obtain address traces for an arbitrary 
number of processors, enabling a study of 
the behavior of cache-coherent machines 
much larger than any built to date. Table 1 
summarizes general characteristics of the 
traces. We will compare the relative per¬ 
formance of the various coherence 
schemes individually for each application. 

The SA-TSP, MP3D, P-Thor, and Lo- 
cusRoute traces were gathered via the 
Trap-Bit method using 16 processors. SA- 
TSP uses simulated annealing to solve the 
traveling salesman problem. MP3D is a 3D 
particle simulator for rarified flow. P-Thor 
is a parallel logic simulator. LocusRoute is 
a global router for VLSI standard cells. 
Weber and Gupta 8 provide a detailed de¬ 
scription of the applications. 

Trap-bit (T-bit) tracing for multiproces¬ 
sors is an extension of single-processor 
trap-bit tracing. In the single processor 
implementation, the processor traps after 
each instruction if the trap bit is set, allow¬ 
ing interpretation of the trapped instruc¬ 
tion and emission of the corresponding 
memory addresses. Multiprocessor T-bit 
tracing extends this method by scheduling 
a new process on every trapped instruc¬ 
tion. Once a process undergoes a trap, the 
trace mechanism performs several tasks. It 
records the corresponding memory ad¬ 
dresses, saves the processor state of the 
trapped process, and schedules another 
process from its list of processes, typically 
in a round-robin fashion. 

The Weather, Simple, and fast Fourier 
transform traces were derived using the 
postmortem scheduling method at IBM. 
The Weather application partitions the 
atmosphere around the globe into a three¬ 


Table 1. Summary of trace statistics, with length values in millions of references 
to memory. 


Source 

Language 

Processors 

Application 

Length 

VAX T-bit 

C 

16 

P-Thor 

7.09 




MP3D 

7.38 




LocusRoute 

7.05 




SA-TSP 

7.11 

Postmortem 

Fortran 

64 

FFT 

7.44 

scheduler 



Weather 

31.76 




Simple 

27.03 

T-Mul-T 

Mul-T 

64 

Speech 

11.77 


dimensional grid and uses finite-differ¬ 
ence methods to solve a set of partial dif¬ 
ferential equations describing the state of 
the system. Simple models the behavior of 
fluids and employs finite difference meth¬ 
ods to solve equations describing hydrody¬ 
namic behavior. FFT is a radix-2 fast 
Fourier transform. 

Postmortem scheduling is a technique 
that generates a parallel trace from a uni¬ 
processor execution trace of a parallel 
application. The uniprocessor trace is a 
task trace with embedded synchronization 
information that can be scheduled, after 
execution (postmortem ), into a parallel 
trace that obeys the synchronization con¬ 
straints. This type of trace generation uses 
only one processor to produce the trace and 
to perform the postmortem scheduling. So, 
the number of processes is limited only by 
the application’s synchronization con¬ 
straints and by the number of parallel tasks 
in the single processor trace. 

The Speech trace was generated by a 
compiler-aided tracing scheme. The appli¬ 
cation comprises the lexical decoding 
stage of a phonetically based spoken lan¬ 
guage understanding system developed by 
the MIT Spoken Language Systems 
Group. The Speech application uses a dic¬ 
tionary of about 300 words represented by 
a 3,500-node directed graph. The input to 
the lexical decoder is another directed 
graph representing possible sequences of 
phonemes in the given utterance. The 
application uses a modified Viterbi search 
algorithm to find the best match between 
paths through the two graphs. 

In a compiler-based tracing scheme, 
code inserted into the instruction stream of 
a program at compile time records the 
addresses of memory references as a side 
effect of normal execution. Our compiler- 
aided multiprocessor trace implementa¬ 
tion is T-Mul-T, a modification of the Mul- 


T programming environment that can be 
used to generate memory address traces for 
programs running on an arbitrary number 
of processors. Instructions are not cur¬ 
rently traced in T-Mul-T. We assume that 
all instructions hit in the cache and, for 
processor utilization computation, an in¬ 
struction reference is associated with each 
data reference. We make these assump¬ 
tions only for the Speech application, 
because the other traces include instruc- 

The trace gathering techniques also dif¬ 
fer in their treatment of private data loca¬ 
tions, which must be identified for the 
scheme that only caches private data. The 
private references are identified statically 
(at compile time) in the Fortran traces and 
are identified dynamically by post¬ 
processing the other traces. Since static 
methods must be more conservative than 
dynamic methods when partitioning pri¬ 
vate and shared data, the performance that 
we predict for the private data caching 
scheme on the C and Mul-T applications is 
slightly optimistic. In practice, the non¬ 
trivial problem of static data partitioning 
makes it difficult to implement schemes 
that cache only private data. 

Simulating a cache-coherence strat¬ 
egy. For each memory reference in a trace, 
our cache and directory simulator deter¬ 
mines the effects on the state of the corre¬ 
sponding block in the cache and the shared 
memory. This state consists of the cache 
tags and directory pointers used to main¬ 
tain cache coherence. In the simulation, 
the network provides no feedback to the 
cache or memory modules. Assume all 
side effects from each memory transaction 
(entry in the trace) are stored simultane¬ 
ously. While this simulation strategy does 
not accurately model the state of the 
memory system on a cycle-by-cycle basis, 
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Table 2. Simulation parameter defaults for the cache, directory, and network. 


Type of Parameter 

Name 

Default Value 

Cache/Directory 

Cache size 

256 Kbytes 


Cache-block size 

16 bytes 


Cache associativity 

Direct mapped 


Cache-update policy 

Write back 


Directory pointer replace policy 

Random 

Network 

Network message header size 

16 bits 


Network switch size 

4x4 


Network channel width 

16 bits 


Processor cycle time 

2 x network 
switch cycle time 


Memory address size 

32 bits 


Base memory access time 

6 x network switch 
cycle time 


it does produce accurate counts of each 
type of protocol transaction over the length 
of a correct execution of a parallel pro¬ 
gram. 

However, since we assume that all side 
effects of any transaction occur simultane¬ 
ously. we do not model the difference be¬ 
tween sequential and concurrent opera¬ 
tions. This inaccuracy particularly affects 
the analysis of chained directory schemes. 
Specifically, when a shared write is per¬ 
formed in a system that uses a chained 
directory scheme, the copies of the written 
location must be invalidated in sequence, 
while a centralized directory scheme may 
send the invalidations in parallel and keep 
track of the number of outstanding ac¬ 
knowledgments. Thus, the minimum la¬ 
tency for shared writes to clean cache 
blocks is greater for the distributed 
schemes than for the centralized schemes. 

Analyzing the trade-offs between cen¬ 
tralized and distributed schemes requires a 
much more detailed simulation. While it is 
possible to accurately model the memory 
system on a cycle-by-cycle basis, such a 
simulation requires much higher overhead 
than our simulations in terms of both pro¬ 
gramming time and simulation runtime. 
Our MIT research group is running experi¬ 
ments on a simulator for an entire multi¬ 
processor system. Simulations of the en¬ 
tire system run approximately 100 times 
slower than the trace-driven simulations 
used for this article. Variants of coherence 
schemes are harder to implement in the 
detailed simulator than in the trace-driven 
environment. To investigate a wide range 
of applications and cache-coherence 
protocols, we avoided the high overhead of 


such detailed simulations by performing 
trace-driven simulations. 

In a trace-driven simulation, a memory 
transaction consists of a processor-to- 
memory reference and its effect on the 
state of the memory system. Any transac¬ 
tion that causes a message to be sent out 
over the network contributes to the aver¬ 
age request rate, average message size, 
and average memory latency. Each type 
of transaction is assigned a cost in terms 
of the number of messages that must be 
sent over the network (including both the 
requests and the responses), the latency 
encountered at the memory modules, and 
the total number of words (including rout¬ 
ing information) transported through the 
network. Given a trace and a particular 
cache-coherence protocol, the cache and 
directory simulator determines the per¬ 
centage of each transaction type in the 
trace. The percentage of the transaction 
type, multiplied by its cost, gives the com 
triburton of the transaction to each of the 
three parameters listed above. 

In addition to the cache-coherence 
strategy, other parameters affect the per¬ 
formance of the memory system. We 
chose values for these parameters (listed 
in Table 2) based on the technology used 
for contemporary multiprocessors. Al¬ 
though we chose a 256-kilobyte cache, the 
results of our analysis do not differ sub¬ 
stantially for cache sizes from 256 kilo¬ 
bytes down to 16 kilobytes because the 
working sets for the applications are small 
when partitioned over a large number of 
processors. The effect of other parame¬ 
ters, including the cache-block size, has 
been explored in several studies (see 


Eggers and Katz 9 and references therein). 

The interconnection network model. 

The directory schemes that we analyze 
transmit messages over an interconnection 
network to maintain cache coherence. 
They distribute shared memory and associ¬ 
ated directories over the processing nodes. 
Our analysis uses a packet-switched, buff¬ 
ered, multistage interconnection network 
that belongs to the general class of Omega 
networks. The network switches are pipe¬ 
lined so that a message header can leave a 
switch even while the rest of the message is 
still being serviced, A protocol message 
travels through n network switch stages to 
the destination node and takes M cycles for 
the memory access. The network is buff¬ 
ered and guarantees sequenced delivery of 
messages between any two nodes on the 
network. 

Computation of the processor utiliza¬ 
tion is based on the analysis method that 
Patel 10 used. The network model yields the 
average latency T of a protocol message 
through the network with n stages, k x k 
size switches, and average memory delay 
M. We derive processor utilization U from 
a set of three equations: 


P = UmB 



where m is the probability a message is 
generated on a given processor cycle, with 
corresponding network latency T. The 
channel utilization (p) is the product of the 
effective network request rate (Urn) and 
the average message size B. The latency 
equation uses the packet-switched network 
model by Kruskal and Snir." The first term 
in the equation (n + B + M - 1) gives the 
latency through an unloaded network. The 
second term gives the increase in latency 
due to network contention, which is the 
product of the contention delay through 
one switch and the number of stages. We 
verified the model in the context of our 
research by comparing its predictions to 
the performance of a packet-switched net¬ 
work simulator that transmitted messages 
generated by a Poisson process. 

Table 2 shows the default network para¬ 
meters we used in our analysis. While this 
article presents results for a packet- 
switched multistage network, it is possible 
to derive results for other types of net- 
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Figure 3. Comparison of coherence schemes. 


works by varying the network model used 
in the final stage of the analysis. In fact, we 
repeated our analysis for the direct, two- 
dimensional mesh network that we plan to 
use in our own machine. With the direct 
network model, the cache-coherence 
schemes showed the same relative behav¬ 
ior as they did with the network model 
described above. The ability to use the 
results from one set of directory simula¬ 
tions to derive statistics for a range of 
network or bus types displays the power of 
this modeling method. 

Analysis of 
directory schemes 

The graphs presented below plot various 
combinations of applications and cache- 
coherence schemes on the vertical axis and 
processor utilization on the horizontal 
axis. Since the data reference characteris¬ 
tics vary significantly between applica¬ 
tions and trace gathering methods, we do 
not average results from the different 
traces. The results presented here concen¬ 
trate on the Weather, Speech, and P-Thor 
applications. We discuss other applica¬ 
tions when they exhibit significantly dif¬ 
ferent behavior. 

Are caches useful for shared data? 

Figure 3 shows the processor utilizations 
realized for the Weather, Speech, and P- 
Thor applications using each of the coher¬ 
ence schemes we evaluated. The long bar 
at the bottom of each graph gives the value 
for “no cache coherence.” This number is 
derived by considering all addresses in 
each trace to be not shared. Processor utili¬ 
zation with no cache coherence gives, in a 
sense, the effect of the native hit/miss rate 
for the application. The number is artificial 
because it does not represent the behavior 
of a correctly operating system. However, 
the number does give an upper bound on 
the performance of any coherence scheme 
and allows us to focus on the component of 
processor utilization lost due to sharing 
between processors. 

To assess the potential of shared data 
caching schemes in general, we compare 
the optimal (full-map) directory scheme to 
the scheme that caches only private data. 
For most applications (including the ones 
shown in Figure 3), the full-map directory 
yields significantly better processor utili¬ 
zation than the scheme that caches only 
private data. Generally good performance 
of the full-map scheme in 16 and 64 pro¬ 
cessor machines implies that caches are 


useful for shared data, even when applica¬ 
tions are not written or compiled specially 
for a system with directory-based cache 
coherence. 

However, for two traces (Simple and 
MP3D), processor utilization for a full- 
map directory is worse than the utilization 
for the private data-cache scheme. Exam¬ 
ining the network model shows the reason 
it is possible for private data caches to 
perform better than full-map directories: 
Even though the private cache scheme has 
a higher network message rate, it uses 
smaller message block sizes. In the model, 
network latency is proportional to the 
square of the message block size but is 
linearly dependent on the message rate. 

The fact that for Simple and MP3D the 
private data-cache scheme performs better 
than the full-map directory scheme indi¬ 
cates that the average time between writes 
by different processors to each shared 
location is low. For these traces, the full- 
map directory scheme does not perform 
significantly better than the limited direc¬ 
tory schemes. 


Limited directory performance. How 

well do limited directories perform com¬ 
pared to the full-map directory scheme? 
The answer depends on the amount of 
shared data, the number of processors that 
access each shared data location, and the 
method of synchronization. The P-Thor 
application was written to minimize com¬ 
munication between processors by reduc¬ 
ing the number of synchronization points 
and the number of processors that read 
each shared location. It is not surprising 
that all of the directory schemes perform 
well for this application. 

On the other hand, four traces show 
significantly worse processor utilization 
for limited directories than for a full-map 
directory due to naive synchronization 
techniques (Weather, Simple, and SA- 
TSP) or widespread sharing of a large read¬ 
only data structure (Speech). 

Chained directory performance. 

When applications use data structures that 
are widely shared and accessed frequently, 
a limited directory performs significantly 
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Figure 4. System-level optimizations. 


worse than a full-map directory. However, 
Figure 3 shows that both singly and doubly 
linked directories perform almost as well 
as the full-map directory protocols. While 
the doubly linked scheme always performs 
slightly better than the singly linked 
scheme, the small increase in performance 
may not justify the additional resources 
needed for the doubly linked scheme. The 
difference between the schemes is small 
because the number of replacements as a 
percentage of total memory accesses is 
very small, even though we simulated di¬ 
rect-mapped caches. 

In general, chained directory schemes 
yield higher utilization than limited direc¬ 
tory protocols. However, chained direc¬ 
tory protocols are more complex and have 
higher write latency than limited directory 
protocols. We are still investigating the 
ramifications of this trade-off. 

Improving the 
performance of 
directories 

The results presented above show that 
limited directory schemes suffer from data 
types that are both widely shared and fre¬ 
quently referenced. We use the Weather 
and Speech applications as case studies to 
demonstrate two methods for ameliorating 
the effects of this type of data. These meth¬ 


ods are examples of system-level optimiza¬ 
tions because they involve contributions 
from several components of a multiproces¬ 
sor system. In addition to improving the 
performance of limited directory schemes, 
the methods also enhance the performance 
of the other coherence schemes. 

The Weather application uses barriers as 
the primary method of synchronization. In 
the straightforward implementation of 
barriers, each processor increments a bar¬ 
rier variable and then spin-locks on a bar¬ 
rier flag. The last processor to reach the 
synchronization point increments the bar¬ 
rier variable to its final value N and writes 
into the barrier flag, thereby releasing the 
spinning processors. The memory accesses 
from many processors spin-locking on -a 
single location cause pointer thrashing 
(repeated evictions) in the limited direc¬ 
tory. 

A software solution, called a combining 
tree,' 2 can alleviate this problem in direc¬ 
tories. Instead of implementing barrier 
synchronizations with a single barrier vari¬ 
able and barrier flag, a balanced tree struc¬ 
ture of nodes can be used for each. To 
demonstrate the benefits of this barrier 
implementation, we modified the 
postmortem scheduler to implement com¬ 
bining tree synchronization. The resulting 
trace was virtually identical to the original 
trace, except with respect to the distribu¬ 
tion of synchronization address accesses. 
In the original trace, all of the synchroniza¬ 


tion addresses were accessed by all of the 
processors. In the combining-tree trace, 
almost all of the synchronization addresses 
were accessed primarily by one processor, 
with just one access by one other proces¬ 
sor. 

The top graph in Figure 4 shows that the 
combining tree dramatically improves the 
performance of the limited directory 
schemes. The darker colored bars show the 
processor utilization of the application 
with linear barrier synchronization, and 
the lighter bars show the enhanced utiliza¬ 
tion when using the combining-tree struc¬ 
ture. The two- and four-pointer directories 
yield nearly the same processor utilization 
as the full-map scheme. The one pointer di¬ 
rectory suffers from sharing of other data 
between processors. However, this data 
sharing must exist only between processor 
pairs, because it does not affect the two- 
pointer directory. Thus, combining tree 
structures and limited directory schemes 
provides an efficient implementation of 
barrier synchronization. 

The Speech application provides an 
example of both a different programming 
model and a different type of widely shared 
data. There are two primary data structures 
in the Speech application: an utterance (the 
sentence to be identified) and a dictionary 
(the algorithm’s vocabulary). For the dura¬ 
tion of the application, these data struc¬ 
tures are only read, but they are shared by 
all the processors in the system. This type 
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of data reference pattern causes pointer 
thrashing in limited directories. 

Given the nature of the Speech applica¬ 
tion, it is fair to assume that all the read¬ 
only variables can be identified by the 
programmer. To assess the potential bene¬ 
fits of marking read-only data, we post- 
processed the trace to find all the data 
locations that were only read for the dura¬ 
tion of the trace. The read-only locations 
were then marked as private to prevent the 
cache and directory simulator from execut¬ 
ing coherence transactions for this data. 
When these locations were identified on a 
block-by-block basis, the system showed 
moderate improvement for the limited di¬ 
rectory schemes. However, when the post¬ 
processor identified the read-only loca¬ 
tions on a word-by-word basis and relo¬ 
cated the data to a special segment of 
memory, the improvement was more pro¬ 
nounced. The bottom graph in Figure 4 
demonstrates the increase in processor 
utilization realized by specially processing 
read-only data. The darkest bars show the 
unoptimized performance of the Speech 
application; the lighter bars show the gains 
due to processing read-only data. 

The boost in processor utilization due to 
read-only data detection on a word-by- 
word basis can be explained by the reduc¬ 
tion of sharing due to cache blocks that 
contain unrelated data words accessed by 
different processors. The Mul-T runtime 
system ignored the boundary of cache 
blocks and allocated read-write data words 
in the same cache blocks as read-only data 
words. This data allocation policy pre¬ 
vented the block-by-block postprocessor 
from properly identifying read-only data 
words and lowered processor utilization 
by creating unnecessary shared data traffic 
in the network. 

When multiprocessor algorithms and 
software are optimized for caches, large- 
scale cache-coherent systems realize their 
execution potential. In the case of the 
Weather and Speech applications, system- 
level optimizations resulted in processor 
utilizations between 0.6 and 0.8 for scal¬ 
able cache-coherence protocols. Coordi¬ 
nating multiprocessor hardware and soft¬ 
ware requires some subset of programmer 
specifications, new language primitives, 
special compile-time analysis, support in 
the runtime system, specialization in the 
processor-to-cache interface, and addi¬ 
tional states in the cache-coherence proto¬ 
col. The modifications described in this 
article represent archetypes of systemwide 
efforts to improve multiprocessor per¬ 
formance. 


T his article has shown that, by using 
system-level optimizations, it is 
possible to build large-scale cache- 
coherent multiprocessors. Using processor 
utilization as a metric, we evaluated the 
performance of several cache-coherence 
protocols, including limited directories 
and chained directories. We compared 
protocols that are scalable in terms of their 
memory overhead to a protocol that cached 
only private data and to a nonscalable 
protocol (full-map). While the scheme that 
cached only private data performed fairly 
well, the shared data caching schemes 
performed better for the majority of the 
applications that we studied. Limited and 
chained directory schemes permitted the 
use of caches to significantly reduce the 
effective shared memory latency. 

There is no hardware panacea for the 
cache-coherence problem. As with many 
other problems in computer architecture, 
good solutions balance hardware and soft¬ 
ware optimizations that combine to im¬ 
prove system performance. When we ap¬ 
plied system-level optimizations to cach¬ 
ing, we were able to improve the perfor¬ 
mance of systems with large numbers of 
processors. 

Our work can be extended in several 
ways. The most straightforward extension 
would repeat our trace-driven evaluation 
using other network models. 

Our research group at MIT is currently 
performing more detailed simulations of 
directory schemes, coupled with processor 
and network simulators, to get accurate 
multiprocessor performance statistics. 
Such simulations allow us to address the 
issue of hot spots, the impact of high- 
latency operations, and the effect of inter¬ 
rupting local cache accesses with external 
invalidation messages. We are also re¬ 
searching various methods for alleviating 
the effects of communication latency. 
These methods include using mul¬ 
tithreaded processors with coherent 
caches, software emulation of directories, 
and coherence models other than sequen¬ 
tial consistency. ■ 
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S ynchronization on a shared-mem¬ 
ory system is an important opera¬ 
tion, since an application’s speedup 
or throughput depends on the operation’s 
efficiency. Synchronization controls ac¬ 
cess to a shared resource, usually some 
data structure shared between processes. 
Fast synchronization mechanisms and 
shared memory have ensured the success 
of shared-memory multiprocessors over 
distributed-memory systems for many 
applications. Higher level synchronization 
mechanisms such as counting semaphores, 
barrier synchronization, and fetch-and- 
operation are often built on top of the 
hardware mechanism in most bus-based 
shared-memory systems. 1 

While software locking algorithms be¬ 
have reasonably efficiently in the absence 
of contention, 2 systems with more than a 
few processors often provide sufficient 
contention to significantly decrease sys¬ 
tem performance. The basic mechanism 
provided in hardware is an atomic memory 
read-write capability often associated with 
a test-and-set instruction on the processor. 
The synchronization mechanism provided 
on most shared-memory machines is a 
hardware lock. Some implementations 
support only a test-and-set operation, 
while others allow arbitrary read-modify- 
write operations. 

Only New York University’s Ultra 3 and 
IBM’s RP3 4 support fetch-and-operation- 
type mechanisms in hardware as part of the 


Knowing the right type 
of locking algorithm 
to use when multiple 
processes contend for 
a single lock can 
prevent performance 
degradation in 
shared-memory 
multiprocessor 
systems. 


combining network. Other interesting 
hardware schemes have been suggested for 
shared-memory multiprocessors. 5 

Our study resulted from a performance 
evaluation of the Symmetry multiproces¬ 
sor system. This evaluation revealed that 
the synchronization mechanism on Sym¬ 
metry did not perform well for highly 
contested locks, as found in certain parallel 
applications. These applications are as 
diverse as parallel scientific codes and 
commercial database systems. 
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Several software synchronization 
mechanisms were developed and evalu¬ 
ated using a hardware monitor on the 
Symmetry multiprocessor system. The 
purpose of these mechanisms was to re¬ 
duce contention for the lock. The mecha¬ 
nisms remain valuable even when changes 
are made to the hardware synchronization 
mechanism to improve support for highly 
contested locks. 

After a brief look at the Symmetry archi¬ 
tecture, we describe a number of lock algo¬ 
rithms and their use of hardware resources. 
We then observe the performance of each 
lock from the perspective of both the pro¬ 
gram itself and the total system perform- 


Architecture 

Sequent’s Symmetry series is a bus- 
based shared-memory multiprocessor 6 
(see Figure 1). A machine can contain from 
two to 30 CPUs with an aggregate per¬ 
formance of around 150 million instruc¬ 
tions per second. Each processor subsys¬ 
tem contains a 32-bit microprocessor, a 
floating-point unit, an optional floating¬ 
point accelerator, and a private cache. The 
system features a 53-megabyte-per-sec- 
ond pipelined system bus, up to 240 mega¬ 
bytes of main memory, and a diagnostic 
and console processor. A Symmetry Model 
C system with a 20-megahertz Intel 80386/ 
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Figure 1. Sequent Symmetry hardware. 


80387 and 128-kilobyte local caches was 
used for the experiments. Each processor 
also has a 32-bit private counter incre¬ 
mented every microsecond. The counters, 
synchronized when the hardware is initial¬ 
ized, serve as a global time-of-day clock 
that every processor can access simultane¬ 
ously. The microsecond clock can be ac¬ 
cessed in an amount of time comparable to 
that for accessing cached memory, and 
without using the system bus. 

The Dynix operating system is a parallel 
version of Unix designed and implemented 
by Sequent for the Balance and Symmetry 
machines. It provides all services of AT&T 
System V Unix as well as those of Berkeley 
4.2 BSD Unix. 

Symmetry coherence protocols. The 

Symmetry system supports the Symmetry 
copy-back cache coherence protocol. 6 This 
protocol supports four cache states: inva¬ 
lid, private, shared, and modified. Private 
and modified are both exclusive states. 
The private state is read-exclusive and the 
modified state is write-exclusive. The 
cache coherence protocol is based on the 
concept of ownership. To perform a write 
operation, a cache must first perform an 
exclusive read operation on the bus (as¬ 
suming a cache miss) to gain ownership of 
the block. Only then can the block be 
updated in the cache. Thus, if another 
cache holds the block in a modified state, it 
must respond to the read-exclusive request 


and invalidate its copy. The responding 
cache asserts the “owned” line on the bus, 
indicating to memory not to respond to that 
request. For a nonexclusive read request 
on the bus, all caches that hold the block in 
a shared state will assert the “shared” line 
on the bus. The memory responds and the 
block is loaded into the requesting cache as 
“shared.” 

Coherent cache protocols such as this 
have the consequence that after the first 
read, subsequent reads will be satisfied in 
the cache with no bus traffic until the cache 
block is modified. This consequence is 
exploited in most spin locks. 

Synchronization mechanisms. The 

synchronization mechanism on the Sym¬ 
metry model uses cache-based locks. The 
locks are also ownership based; that is, a 
locked read from a processor is treated like 
a write operation by the cache controller. 
The cache controller performs an exclu¬ 
sive read operation on the bus (assuming a 
cache miss) to gain ownership of the block. 
The atomic operation is then completed in 
the cache. These locks are optimized for 
multiuser systems where locks are lightly 
contended and the critical sections are 
short. They are cache-based so that when 
these conditions exist, the lock and unlock 
operations can be done without any further 
bus access. They do not work well in some 
parallel applications where a lock is heav¬ 
ily contended. Several other software syn¬ 


chronization schemes can reduce conten¬ 
tion for the locks in the hardware. Inde¬ 
pendent studies by Anderson 7 and Sequent 
evaluated the performance of these 
schemes. This article describes the Se¬ 
quent study. 

Response latency. In general, caches in 
multiprocessor systems serve two masters, 
the processor and the bus. A cache must 
respond to bus requests when it owns a 
dirty block, and also to processor requests. 
The memory responds to only one proces¬ 
sor access at a time; hence, it can respond 
much faster. Therefore, a cache-to-cache 
transfer is usually slower than a memory- 
to-cache transfer. The Symmetry multi¬ 
processor system follows this pattern. 

Sequent’s system bus is a split-transac¬ 
tion bus. A fixed number of requests are 
allowed on the bus, and responses to re¬ 
quests are strictly ordered. Responses to 
earlier requests must occur before re¬ 
sponses to later requests are allowed on the 
bus. 

The number of requests allowed on the 
bus is optimized for the number of cycles 
required by a memory response, because 
memory responds to most bus requests. 
Cache responses, having longer latency, 
require more bus cycles than memory re¬ 
sponses. The additional bus cycles spent 
waiting for nonoptimal, slower-than- 
memory response are wasted, since no 
further requests can be put on the bus. 
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To lock: 
for (;;) { 

while (*lock == LOCKED); 

if (atomic_exchange_byte(lock, LOCKED) != LOCKED) 

} 


Figure 2. Pessimistic variation of the snooping lock. 


To lock: 


static private unsigned maxdelay = 0; 


( int count, delay; 

if (lock == UNLOCKED && 


atomic_exchange_byte(lock, LOCKED) != LOCKED) { 

for (maxdelay /= 2; ; maxdelay = 2* 

: maxdelay + 1) { 

do { 

while (lock != UNLOCKED); 

/* spin until unlocked */ 

delay(irand() & maxdelay); 

/* delay using clock */ 

1 while (lock != UNLOCKED); 

/* check again */ 

if (atomic_exchange_byte(lock, LOCKED) != LOCKED) 

break; 

1 

} 

) 



Figure 3. Delay-after-release variation of the collision avoidance lock. 


These cycles are called “hold” cycles. 
Thus, if a cache responds to a bus request, 
potentially useful bus cycles are wasted as 
hold cycles. 

Only the caches respond when a highly 
contested lock is accessed. Thus, many bus 
cycles during this operation are hold 
cycles, observed using the hardware moni¬ 
tor. We evaluated the software synchroni¬ 
zation schemes by observing these cycles 
during the test. 

Algorithms 

We investigated a simple test-and-set 
lock, locks with read snooping, collision 
avoidance locks, tournament locks, and a 
queuing lock. 

The algorithms are given here in the C 
language, although the actual measure¬ 
ments reported later were made on a hand- 
coded Intel 80386 assembly language ver¬ 
sion of each algorithm. We used the “asm 
function” capability to allow the assembly 
language to be expanded in line in the test 
program. 

Several functions require explanation. 
The atomic_exchange functions exchange 


the second argument with the memory 
value indicated by the first argument. The 
latter value is returned as the function re¬ 
sult. The myid variable, a unique process 
identification value, is a small positive 
integer. 

Simple lock. The simplest test-and-set- 
lock data structure can be a byte having 
two values: locked and unlocked. To ini¬ 
tialize a lock, 

char *lock; 

♦lock = UNLOCKED; 

To lock, an atomic instruction is used to 
implement a test-and-set operation. Each 
process continues to test and set a byte in 
shared memory until it finds that the previ¬ 
ous value was zero. 

while (atomic_exchange_byte 
(lock, LOCKED) == LOCKED); 
Unlocking is done by clearing the byte to 
unlocked. 

(void) atomic_exchange_byte 
(lock, UNLOCKED); 

On the Sequent Symmetry this is done 


via an atomic exchange instruction to pre¬ 
vent the unlocking write from occurring 
between a read and write of another pro¬ 
cess’ test-and-set instruction. It is done for 
compatibility with preceding write- 
through models where the lock only pre¬ 
vents two atomic instructions from occur¬ 
ring at the same time. It does not exclude 
other read and write requests from occur¬ 
ring. The instruction often used, the ex¬ 
change instruction, is implicitly atomic in 
the Intel 80386 instruction set. 

Under contention, each waiting process 
continuously requests to read and modify 
the shared byte with a lock. The unlock 
operation must compete with lock opera¬ 
tions to access the byte. 

Snooping locks. Snooping locks take 
advantage of cache coherency to eliminate 
bus transactions by waiting processes until 
the lock is released. 8 They have the same 
data structure and values as the simple 
lock. In fact, data structure, initialization, 
and unlocking are identical to those of the 
simple lock. Since only one cache block is 
involved, snooping locks present minimal 
bus traffic when an uncontested lock 
changes owners. On the other hand, the 
0(n A 2) rush of bus traffic when the lock is 
released is still present. If caches are up¬ 
dated instead of invalidated, the O(n) lock 
attempts will still generate some bus traffic 
that will interfere with the process in the 
critical section as well as with other pro¬ 
cesses not involved in the lock. 

Optimistic variation. The optimistic 
variation of this algorithm improves the 
simple lock by limiting bus activity of 
waiting processes to times when they have 
a chance of getting the lock. 

To lock, each waiting process attempts a 
test-and-set on a shared byte, as before. If 
unlocked, it has obtained the lock. Other¬ 
wise, it reads the byte until it becomes 
unlocked before attempting another test- 
and-set. The cache satisfies further read 
requests until another process unlocks the 
lock or attempts a test-and-set, invalidat¬ 
ing the cache copy of the lock. 

The C code to lock is 

while (atomic_exchange_byte 
(lock, LOCKED) == LOCKED) 
while (*lock == LOCKED); 

This version produces no bus requests 
while a number of processes wait for the 
lock. When the lock is released, however, 
a flurry of competing test-and-sets — and 
later reads — flood the bus. If the lock is 
held a long time, the impact is unimpor- 
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struct tlock { 

struct cache_block { 

char slock; /* snooping lock */ 
char pad[15]; /* 16-byte cache block */ 

} blk[32]; /* B A H */ 

); 


Figure 4. Example data structure for a tournament lock. 


tant. However, for short critical sections, 
the lock is released before the last spurt of 
activity has subsided, resulting in continu¬ 
ous bus saturation. 

Pessimistic variation. A pessimistic 
variation is identical to the optimistic 
snooping lock except that it begins by read¬ 
ing the lock byte rather than by attempting 
an initial test-and-set. This is useful under 
contention, since it prevents the initial test- 
and-set of an arriving process from disturb¬ 
ing the waiting processes in the same way 
an unlock disturbs them. However, it in¬ 
creases the latency for noncontended locks, 
and it does nothing to solve the problem of 
contention that occurs when the lock is 
released. (See Figure 2.) 

Collision avoidance locks. After study¬ 
ing the effects of snooping locks, Ander¬ 
son 7 proposed collision avoidance as a way 
to reduce contention. With this method, 
each waiting process delays a different 
amount of time before rechecking and at¬ 
tempting to obtain the lock. This reduces 
the number of unsuccessful test-and-set 
instructions and the resulting reads by other 
waiting processes. 

There are many possible variations of 
collision avoidance locks. In some the ini¬ 
tial attempt may vary, as in the snooping 
locks. In others the initial delay parameter 
may be a constant, or a value determined by 
experience. Anderson showed that expo¬ 
nential increases, preferred over linear 
increases, allow newly arriving processes 
to adjust rapidly to the optimal delay. He 
also found that the delay should not be 
increased when the lock is busy, but only 
when it is unlocked and a subsequent at¬ 
tempt to obtain it fails. Various combina¬ 
tions of spinning and/or polling can be 
used, either before or after the delay. 

We encountered several pitfalls in evalu¬ 
ating collision avoidance locks. If the 
maximum delay is inadequate and the delay 
is not increased exponentially, the perform¬ 
ance may degenerate suddenly as the num¬ 
ber of processes increases. The delay 
should be a function of bus speed, not 
processor speed. As we ran earlier algo¬ 
rithms parameterized for the slower pro¬ 
cessors existing at that time, we found that 
delays were inadequate. Moreover, Se¬ 
quent supports systems with mixed-speed 
processors. Our solution uses the microsec¬ 
ond clock to count out the delays. This 
allows the parameterization to span several 
generations of processors and reduces the 
effect of locking out slower processors. 

On the other hand, using too large a delay 


produces an extreme bias toward newly 
arriving processes. By storing the delay 
value in the lock, we found in one lock that 
newly arriving processes obtained 97 per¬ 
cent of the lock acquisitions. This short¬ 
term unfairness was masked by the fact 
that every tenth of a second Symmetry 
processors must process an unmaskable 
day-clock interrupt. This allows processes 
with a large delay to obtain the lock and 
switch roles with the previously dominant 
processes. The repeated switching of roles 
gives the appearance of fairness over the 
long term. A second method of counting 
the lock acquisitions in each 32-microsec¬ 
ond time interval for each process also 
identified grossly unfair locks. 

We chose two implementations based 
on Anderson’s work. Both check the lock 
each time before attempting to obtain it. 
Both initialize their maximum delay to half 
of its value when they last acquired the 
lock. Both increase the maximum delay by 
doubling and adding 1 (1, 3, 7, 15, 31...). 
Both increase the delay only after failed at¬ 
tempts to obtain the lock via a test-and-set, 
but not after a check of the lock using a read 
finds it busy. Both use a random number 
generator to compute the delay. 

The maximum delay was chosen to be 
127 microseconds, so that one to two pro¬ 
cesses, on the average, would check the 
lock in a near-empty critical section. This 
is adequate for rapid response to an unlock 
operation while also providing good con¬ 
tention relief. In checking the delays of 16 
processes obtaining the locks, we found 
that 22 percent were newly arriving pro¬ 
cesses, 38 percent had a maximum delay of 
63 microseconds, and 40 percent had the 
maximum delay of 127 microseconds. 

The lock data structure, initialization, 
and unlock operations are the same as those 
for the simple lock algorithm. One private 
cell per process holds the maximum delay, 
and another cell is used by the random 
number generator. 

Delay-after-release variation. The first 
variation waits for the lock to be released 


before delaying. The function irand() re¬ 
turns an integer. The delay(jc) function uses 
the microsecond clock to delay for x micro¬ 
seconds. (See Figure 3.) 

Delay-between-reference variation. 
The second variation merely polls the lock 
after each delay. The tight spin is omitted 
from the previous algorithm. 

Both locks avoid a rush of bus activity as 
a lock is released. On systems that update 
stale values rather than invalidate them, 
the first variation would do slightly better. 
The second variation may be useful on 
systems without cached locks, since the 
rate of polling is already low enough that it 
doesn’t significantly affect bus operations. 

Collision locks have all the advantages 
of snooping locks and none of the disad¬ 
vantages. For uncontested locks, they have 
minimal latency. As contention increases, 
they still save plenty of bandwidth for 
processes not involved in the lock. 

Tournament locks. A second approach 
to reducing contention is to have a tree of 
locks of radix B and height H. The tree 
forms a tournament wherein winners of 
leaf lock contests become contestants at 
the next level. The winner of the root lock 
has permission to enter the critical section 
protected by the tree of locks. 

Each process uses its process identity to 
choose a random path from the root to a 
leaf lock. The process may contend only 
for locks on that path. While every process 
may contend for the root lock, the number 
of processes eligible to contend for a lock 
decreases by the radix of the tree at each 
level as we proceed toward the leaves. 
Thus, contention at the leaf locks can be 
made arbitrarily small as the number of 
leaves approaches the number of pro¬ 
cesses. 

Each lock must be allocated to a separate 
cache block to prevent interference be¬ 
tween processes manipulating different 
locks. The data structure forB = 2 and H = 
5 is shown in Figure 4. (Array element zero 
is wasted for convenience but need not be 
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struct q_lock { /* the lock */ 

char bytes[NPROCS]; 
int who_was_last; 
char this_means_locked; 

) the_lock; 


Figure 5. The queuing lock data 
structure. 


in practice.) 

The tournament lock is initialized by 
setting each snooping lock in the tree to 
unlocked: 

for (i = 1; i < (1«(H-1)>; i++) 
lock->blk[i],slock = UNLOCKED; 

The tournament lock is unlocked by un¬ 
locking the snooping root lock in the usual 
manner: 

(void) atomic_exchange_byte 
(lock->blk[ 1 ].slock, UNLOCKED); 

Pessimistic variation. The pessimistic 
version of the tournament lock assumes 
that there is high contention and enters 
competition at its leaf lock. Once it obtains 
the leaf lock, it can proceed toward the root 
lock. 

After a process obtains an interior lock, 
it releases the last lock it previously ob¬ 
tained. This allows another waiting pro¬ 


cess to follow the first process up the tree. 

Each lock in the tree is the pessimistic 
snooping lock. The snooping lock is used 
because of greatly reduced contention at 
the leaf-level locks. (We have omitted the 
C code for this lock, since it does not 
contribute to clarity.) 

Assuming that the distribution of the 
process identity to leaves is random, each 
lock in the tree reduces contention by a 
factor of B. The overall reduction for the 
entire tree is B A H. 

We can expect this lock variation to 
have a latency of H times the cost of the 
optimistic snooping lock with at most B 
processes contending. So there is a sub¬ 
stantial minimum cost even if the lock is 
not contested. 

Optimistic variation. The optimistic 
version attempts to make the cost of con¬ 
tention relief and the latency proportional 
to the base B logarithm of the number of 
contending processes. It has two phases. In 
the first phase a newly arriving process 
uses a read to determine whether the root 
lock is free. If it is, it will try to obtain the 
lock via a test-and-set. If not, or if the test- 
and-set fails, it moves one step toward its 
leaf node (unless already at the leaf node) 
and tries to obtain that lock. Once it obtains 
a lock in this manner, it enters the next 
phase. 

The second phase is identical to the 
pessimistic tree lock. The process works 
its way back to the root lock. 

Each lock in the tree may be either an 


tojock(lock) 
struct qjock *lock; 

( char who_is_ahead_of_me, what_is_locked; 
hardware_lock(); 

who_is_ahead_of_me = lock->who_was_last; 
what_is_locked = lock->this_means_locked; 
lock->who_was_last = myid; 
lock->this_means_locked = lock->bytes[myid]; 
hard ware_unlock(); 

while (lock->bytes[who_is_ahead_of_me] == what_is_locked) 
/* spin in cache */ ; 

} 

to_unlock(lock) 
struct q_lock *lock; 

{ 

lock->bytes[myid] A = 1; 

} 


Figure 6. The algorithms used to lock and free the queuing lock. 


optimistic or a pessimistic snooping lock, 
since the initial phase provides the initial 
read that distinguishes one from the other. 
The version we tested used pessimistic 
snooping locks. (Again, we omit the C 
code for clarity.) 

Contention for the root lock is limited to 
B processes plus lucky newly arriving 
processes whose initial read determines 
that the root lock is in its unlocked state. 
This version has nearly the same conten¬ 
tion relief as the pessimistic version but 
much lower costs in terms of processing 
and bus transactions when the lock is not 
highly contested. The costs are compa¬ 
rable to the pessimistic snooping lock for 
no contention, and they grow proportion¬ 
ally with the logarithm of the number of 
waiting processes. However, the worst- 
case performance may be slightly worse 
than with the pessimistic variation when 
that pessimism is justified. At that point 
the optimistic version will do extra work, 
which may increase the latency. Partially 
offsetting this is the fact that newly arriv¬ 
ing processes may get lucky and obtain the 
lock with relatively little work. 

Queuing lock. A tree of locks can re¬ 
duce contention to two processes when B = 
2. But what if B = 1? Assuming that H is 
sufficiently large, the contention is limited 
to a single enqueue operation, performed 
by new processes as they arrive. This per¬ 
mits the hand-off of the lock to be free of 
contention. With a little more optimization 
on the intermediate locks, we have a queue 
lock. Anderson arrived at a similar queue 
lock independently. 

This lock requires a more complex data 
structure. Instead of a single byte to indi¬ 
cate whether the lock is locked or un¬ 
locked, we now have one such byte per 
process. The identity of the process that 
last attempted to acquire the lock is re¬ 
corded in the lock. Finally, instead of the 
fixed values locked and unlocked, we have 
the last process that attempted to acquire 
the lock deciding what value represents 
“locked.” (See Figure 5.) 

A newly arriving process sets a hard¬ 
ware lock (see Figure 6). It reads the iden¬ 
tity of the process that arrived ahead of it 
and the value the previous process chose to 
represent “locked.” It then places its own 
identity and its own byte’s value into the 
lock and releases the hardware lock. Using 
the locked value of the previous process, it 
then waits for the byte value of that process 
to differ from locked. 

To free the lock, the process simply 
changes the value in its own byte. To ini- 
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Table 1. Communication operations to lock and unlock a contended lock. 


Algorithm 

Noncritical 


Critical Section 

Arrival 

Failure/Wait 

Success 

Unlock 

Simple 

_ 

k*n* M 

M 

M or W 

Snoop1(0) 

M 

«*(M+n*R) 

R+M 

M or W 

Snoop2(P) 

R 

(n-l)*(M+n*R) 

R+M 

M or W 

Back_rel 

MorR 

n/c*n/c(R+M/c ) 

R+M 

M or W 

Back ref 

MorR 

n/c(R+M/c) 

R+M 

M or W 

Toumament(P) 

(2*/!-l)*(R+M) 

R+M 

R+M 

M or W 

Tournament(O) 

(2*log(n)-l)*(R+M) 

R+M 

R+M 

M or W 

Queue 

M+R 

- 

R 

W 

Key: 

R read access 

W write access 

M read-modify-write access 

n is the number of processes 
c is the ratio of the average delay time to 

the critical-section 

time for n 

processes 

k is the number of attempts that can be made in the critical-section timi 


h is the height of the tree of locks 





tialize the lock, we simply set the process 
identity to any process and set the 
“this_means_locked” value unequal to the 
value of that process’ byte: 

lock->who_was_last = 0; 

lock->this_means_locked = 
lock->byte[0] A 1; 

This alternation of “locked” values pre¬ 
vents a race condition. If a process were to 
set the value of its cell to locked before 
trying for the lock, and to unlocked after¬ 
ward, the process behind it might not “see” 
the unlocked value before attempting to 
acquire the lock again — setting the value 
back to locked. This commonly occurs 
when the lock is acquired by the same 
process twice in a row. 

On the Symmetry system the atomic 
portion of the lock algorithm can be per¬ 
formed with a single 32-bit integer ex¬ 
change instruction. Of course, the byte 
values are placed in separate cache blocks 
to avoid unwanted interference from adja¬ 
cent bytes. Also, the address of a process’ 
lock byte serves as its identity. By allocat¬ 
ing the bytes to even addresses and using 
only one bit for the value, both values can 
be packed into a word. This allows the 
number of processes to be determined 
dynamically. 

This lock has some favorable properties 
under contention. The number of bus trans¬ 
actions for a contested lock is four. One 
less read will be done in the uncontested 
case because the read that “sees” the lock 
in its locked state will not occur. Further¬ 
more, only one process, the “next” pro¬ 
cess, will do the read after an unlock, and 
it will have no competition. The remaining 
waiting processes are safely off the bus and 
out of the way. The absence of a write to 
relock the lock during the hand-off means 
that the contribution of the lock to the 
critical section is minimal. 

This queued lock has the disadvantage 
that it does not trivially provide for a true 
conditional lock. A conditional lock func¬ 
tion either acquires the lock or returns a 
failure result if the lock was already 
locked. It does not wait. While it is easy to 
return failure if the lock is already locked, 
there is no guarantee that a process that 
fails its initial attempt will return quickly. 
Also, if a process wishes to reclaim its cell 
but was the last to obtain a lock, it must 
obtain the lock merely to substitute a pub¬ 
lic cell for its own cell. 

Comparisons. The algorithms present 
various trade-offs in terms of their memory 


requirements, impact on communication 
resources, uncontested latency, and contri¬ 
bution to the critical-section time. 

Memory requirements of the simple, 
snooping, and collision avoidance locks 
are minimal — a single byte. Tournament 
locks require multiple cache blocks to 
perform well. Queue locks require mem¬ 
ory in proportion to the number of pro¬ 
cesses, and the processes must act to re¬ 
claim memory after using the lock. 

The impact of communication opera¬ 
tions is heavily architecture dependent. 
Some systems may allow only one atomic 
read-modify-write operation, making 
these operations significantly more costly 
than writes. Systems with cache invalida¬ 
tion may perform significantly more reads 
for some algorithms than systems with 
write-broadcast-update coherency. 

Table 1 summarizes “rule of thumb” 
estimation formulas for a single lock and 
unlock operation on a lock with A contend¬ 
ing processes. The formulas are meant to 
show the growth of operations as the 
number of processes increases; they are 
not meant to be predictors of performance, 
even when properly parameterized. The 
costs are labeled in terms of read (R), write 
(W), and atomic read-modify-write (M) 
operations. The operations performed in 
the arrival and wait stages may affect 
uncontested latency and total system per¬ 
formance. Operations performed in suc¬ 


cessfully obtaining a lock and releasing it 
add to the duration of the critical section. 

The simple lock clearly floods the com¬ 
munication network. The contested snoop¬ 
ing lock also has unsatisfactory worst-case 
behavior when the critical section is short. 
Collision avoidance and tournament locks 
seem promising. The queue lock should 
perform optimally in the contested case, 
but in the uncontested case it will move 
three cache blocks from processor to pro¬ 
cessor while the other locks will move only 
one. That is, the “success” and “unlock” 
operations are free when the lock is uncon¬ 
tested for the byte locks. 

As the number of processes becomes 
large, the simple and snooping locks be¬ 
come unreasonable, requiring 0(N A 2) 
operations for each release with N pro¬ 
cesses contending. The collision avoid¬ 
ance algorithms using exponential delay 
growth can adjust to a growing number of 
processes, keeping the bus activity fairly 
linear. Tournament locks slow the growth 
to 0(logAO, increasing in value as the 
number of processes becomes larger. Fi¬ 
nally, the queue lock remains unaffected 
by the number of processes. 

The uncontested latency of the simple, 
optimistic snooping, and collision avoid¬ 
ance lock algorithms is optimally short. 
The pessimistic snooping, optimistic tour¬ 
nament, and queue lock algorithms are not 
far behind. The pessimistic tournament 
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#include “parallel.h” 

#include <stdio.h> 

#define M 10000000 
shared slock_t 1; /* the lock */ 

int np; /* the number of processes */ 

int count; /* a private counter */ 

doit() 

{ 

register int i, j; 

j = M / np; /* allocate the iterations evenly */ 
for (i = 0; i < j; i++) { 

S_LOCK(&l); 

count++; 

S_UNLOCK(&l); 

delay(l); /* use microsecond clock */ 

} 

} 

main(argc, argv) 
int argc; 
char *argv[]; 

{ 

S_INIT_LOCK(&I); 
if (argc != 2) { 

fprintf(stderr,“usage: test np\n”); 
exit(l); 

) 

sscanf(argv[l], “%d”, &np); 

m_set_procs(np); 

m_fork(doit); 


Figure 7. Test code for the simple, snooping, and collision avoidance algorithms. 


lock is extremely bad for uncontested 
locks, requiring the same amount of work 
in the best and worst cases. 

The queue lock contributes the least to 
the critical section. The simple lock would 
also be good except that the atomic-modify 
requests of waiting processes often delay 
the unlock operation, adding time to the 
critical section. The other algorithms are 
all equal and reasonable. 

Performance results 

Several experimental tests analyzed the 
behavior of high-contention locks on the 
system bus. A pathological case causing 
worst-case behavior on the system bus was 
devised. The program was designed to 
increment a counter N times, dividing the 
work evenly among the processes. The 
counter is incremented only once inside 
the critical section protected by the lock. 


The amount of computation in the critical 
section is small compared with the amount 
required for synchronization. While the 
counter would be shared in actual applica¬ 
tions, a private counter was used for this 
test so that the bus activity observed would 
be totally attributable to the lock/unlock 
operations. This also had the effect of 
making the critical section shorter than it 
would be otherwise, particularly when the 
bus is heavily used. 

Each process enters the critical section a 
predetermined number of times. A 1- 
microsecond delay is inserted after the 
lock is released. This allows other pro¬ 
cesses to obtain the lock and the cache 
block the lock is in. The releasing process, 
therefore, can be considered a newly arriv¬ 
ing process with no special access to the 
lock in its next attempt. We also tried a 10- 
microsecond delay, but the results did not 
differ significantly from those of the 1- 
microsecond delay, taking into account the 


fact that the delay effectively removes one 
to two processes from contention. 

Test code for the simple, snooping, and 
collision avoidance algorithms appears in 
Figure 7. The test code for the other algo¬ 
rithms was identical except for substitu¬ 
tion of the more complex lock data struc¬ 
tures. The shared keyword indicates vari¬ 
ables in shared memory. The others are 
per-process variables copied upon fork. 
The program does approximately the same 
amount of “useful work” regardless of the 
number of processes. 

The caches become the sole responders 
when a highly contested lock is accessed. 
Thus, during this operation many bus 
cycles are hold cycles, since caches are 
responding to the requests for the lock. 

We analyzed system behavior by exam¬ 
ining the number of hold cycles on the bus 
caused by excessive cache-to-cache traffic 
on the bus. This traffic is caused by the 
locking activity. We wanted to determine 
which of the software mechanisms for 
synchronization produced hold cycles on 
the bus. The other metrics used for measur¬ 
ing performance were the real time for the 
test to complete and the total bus use. 

The Symmetry multiprocessor has built- 
in, nonintrusive performance instrumenta¬ 
tion for measuring both hardware and 
operating system performance. The hard¬ 
ware instrumentation measures perform¬ 
ance of the cache and bus protocols. The 
software instrumentation measures utili¬ 
zation of processor, disk, and other operat¬ 
ing system functions. 

Using the hardware monitor, we ob¬ 
served the hold cycles caused by the lock¬ 
ing activity. We evaluated the software 
synchronization schemes by observing the 
hold cycles during the test. 

The Symmetry Model C system used in 
this experiment was configured with 30 
processors. Model C is a copy-back system 
with a two-way set-associative cache of 
128 kilobytes. Model C supports the Sym¬ 
metry coherence protocol, as described 
earlier. Two processors were not used in 
the measurements. One was dedicated to 
the performance monitor; the other was 
reserved to handle interrupts and periodic 
chores of the operating system. The moni¬ 
tor does not cause any intrusion on the bus. 
No other activity was present on the sys¬ 
tem. 

Performance analysis. Figures .8-10 
show the performance of simple, snoop¬ 
ing, collision avoidance (two variations), 
tournament, and queue-based synchroni¬ 
zation mechanisms. Figure 8 shows real 
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time for all tests except for one using the 
simple lock algorithm. We do not present 
the sifr.ple lock results beyond eight pro¬ 
cessors because we already know that lock 
to be the poorest performing mechanism. 
In fact, the test using simple locks would 
take a very long time to complete. The two 
snooping locks are indistinguishable and 
are therefore presented as one curve. 

The initial decrease in real time from 
one to two processes results from the delay 
after unlocking in the test driver. Without 
the delay after unlocking, only the pessi¬ 
mistic tree lock and the queue lock have an 
initial decrease in execution time. This is 
due to their strict first-in, first-out proper¬ 
ties and to the fact that only a relatively 
small portion of the locking time is in the 
critical section. 

This delay removes one process from 
contention. A separate test using the micro¬ 
second clock should be used to time lock 
acquisition for uncontested locks not in 
cache. 

Figure 9 shows hold cycles generated by 
all tests. Figure 10 shows all bus cycles, 
including hold cycles, generated by each 
test. 

The hold cycles introduced by conten¬ 
tion for locks start to inhibit the perform¬ 
ance of these tests. The useful bus utiliza¬ 
tion for some tests is dwarfed by the hold 
cycles. We must remember, though, that 
this is a pathological case. In real parallel 
applications the amount of computation is 
much higher than in these tests. The 
shared-memory architecture of Symmetry 
supports medium- or large-grain parallel¬ 
ism well, as the results described here 
indicate. 

Tests on the simple lock and the snoop¬ 
ing locks saturate the bus after five and 10 
processes, respectively. As expected, the 
simple lock generates the most hold cycles. 
However, the snooping locks provide only 
minimal relief, since practically no bus 
bandwidth is available for other processes. 

The next poorest performing algorithms 
are the tournament algorithms. This is sur¬ 
prising, since the additional delay intro¬ 
duced reduces the number of hold cycles. 
However, this delay (computation) gener¬ 
ates a lot of activity. This can be seen by the 
increase in bus use. Thus, the total number 
of bus cycles consumed is greater than the 
number consumed by collision avoidance 
locks. 

The test using bottom-up (pessimistic) 
tournament locks takes significantly less 
time than the one using snooping locks. 
However, this test takes much longer in 
real time than one using top-down (opti- 



Simple: 
Snoopl: 
Tournament (P): 
Tournament (O): 
Back_rel: 
Back_ref: 
Queue: 


simple test-and-set lock 
optimistic snooping lock 
pessimistic tree lock 
optimistic tree lock 

delay-after-release collision avoidance lock 
delay-between-reference collision avoidance lock 
queuing lock 


Figure 8. Real-time performance of simple, snooping, collision avoidance, tour¬ 
nament, and queue-based locks. 



Snoopl: optimistic snooping lock 
Tournament (P): pessimistic tree lock 
Tournament (O): optimistic tree lock 

Back_rel: delay-after-release collision avoidance lock 
Back_ref: delay-between-reference collision avoidance lock 
Queue: queuing lock 


Figure 9. The bus hold cycles generated in testing the various locks. 
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Snoopl: optimistic snooping lock 
Tournament (P): pessimistic tree lock 
Tournament (O): optimistic tree lock 

Backjel: delay-after-release collision avoidance lock 
Back_ref: delay-between-reference collision avoidance lock 
Queue: queuing lock 


Figure 10. The bus cycles and hold cycles generated in the tests. 


mistic) tournament locks, until sufficient 
processes are available to overcome the 
latency of going through the levels. This 
occurs at about four processes. Periodi¬ 
cally, between four and 10 processes and 
between 20 and 26 processes, the optimis¬ 
tic tournament generates more hold cycles 
than the pessimistic version. However, the 
smaller amount of work that it does allows 
it to execute faster for smaller numbers of 
processes. The trend indicates that as the 
number of processes increases, the differ¬ 
ence between the two tests becomes 
smaller. This is understandable, since 
contention for the lock increases. Actually, 
the pessimistic version will probably take 
less time beyond 30 processes, the point at 
which the extra work in the optimistic 
version no longer pays off. 

Bus use is higher for the test using the 
optimistic tournament locks than for that 
using the snooping locks. This is because 
more work is done to reduce contention for 
the main lock. The hold cycles on the bus 
significantly decrease with the test using 
the optimistic tournament locks. However, 
the hold cycles are not completely gone, 
since some contention still occurs. 

Note that collision avoidance locks per¬ 
form well across the entire range of pro¬ 


cesses. They have the least amount of bus 
traffic for small numbers of processes and 
reasonably flat actual time curves. The 
difference in execution time is negligible, 
but bus use differs significantly. And while 
the delay-after-release version has greatly 
reduced bus traffic after the lock is re¬ 
leased, the polling version eliminates that 
read altogether. Caches that update rather 
than invalidate would probably make this 
difference negligible. 

The test using the queue-based locks 
shows the best performance at the high 
end. The real time for the test is identical 
after two processes because the algorithm 
ensures that one process is always con¬ 
tending for the lock. The hold cycles for 
this test are negligible. Bus use for these 
locks is constant as the number of pro¬ 
cesses increases. 

While the queue lock is comparable to 
the collision avoidance locks for small 
numbers of processes, it consumes more 
bus bandwidth than the backoff-reference 
lock for up to 15 processes. This suggests 
that the backoff-reference lock is preferred 
when, instead of just one, several locks are 
contested simultaneously. For large scal¬ 
able systems, queue locks would be better. 


L ocks rather than shared data repre¬ 
sent the real hot spots in these sys¬ 
tems for some parallel applica¬ 
tions. We observed a problem with highly 
contested locks on the present' Symmetry 
system for such applications. Several soft¬ 
ware synchronization schemes were de¬ 
veloped and evaluated to reduce the bus 
traffic caused by this synchronization 
mechanism under high contention. We 
used a hardware monitor to observe and 
evaluate performance of these schemes. 
Each scheme showed markedly different 
effects in both the number and kind of 
requests generated under contention. 

The simple lock algorithm should never 
be used on systems with support for cache 
coherence. Locks that may encounter con¬ 
tention should be protected with some form 
of collision avoidance. Snooping locks are 
probably inadequate for all but modest 
numbers of processes. The backoff-refer¬ 
ence lock is recommended for general use, 
where contention for multiple locks is 
common. Due to its low bus use, more bus 
bandwidth remains for processes not par¬ 
ticipating in the contested locks. 

When many processes contend for a 
single lock, a queue lock gives the best 
execution time. The queue lock makes a 
small contribution to the critical section as 
a result of its one-to-one interprocess 
communication, and it is totally insensi¬ 
tive to the number of processes. An ex¬ 
ample of this might be a parallel loop, 
where the number of processes used from 
loop to loop is unchanged. 

The hardware mechanism should sup¬ 
port highly contested locks more effi¬ 
ciently, but not at the expense of increasing 
latency for lightly contested locks. This 
can be done by adding a write-broadcast 
capability on bus-based systems. This al¬ 
lows the process that releases the lock to 
update other caches spinning on the lock 
without bus activity. Further bus activity 
resulting from the acquisition of locks can 
be reduced by read snooping. Even sys¬ 
tems with broadcast capabilities require 
the proper software algorithms. Polling 
locks require only primitive test-and-set 
support, although other locks and other 
synchronization operations benefit from 
additional hardware support. More elabo¬ 
rate hardware schemes are unnecessary 
even when considering larger nonbus- 
based shared-memory multiprocessor sys¬ 
tems. ■ 
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Architectures 


New coherence schemes scale beyond single-bus-based, 
shared-memory architectures. This report describes three research 
efforts: one multiple-bus-based and two directory-based schemes. 


Introduction 

Shreekant Thakkar, Sequent Computer Systems 
Michel Dubois, University of Southern California 

Anthony T. Laundrie and Gurindar S. Sohi, University of Wisconsin-Madison 


There are two forms of shared-memory 
multiprocessor architectures: bus-based 
systems such as Sequent’s Symmetry, 
Encore’s Multimax, SGI’s Iris, and Star- 
dent’s Titan; and switching network-based 
systems such as BBN’s Butterfly, IBM’s 
RP3, and New York University’s Ultra. 
Because of the efficiency and ease of the 
shared-memory programming model, 
these machines are more popular for paral¬ 
lel programming than distributed multi¬ 
processors such as NCube or Intel’s iPSC. 
They also excel in multiprogramming 
throughput-oriented environments. Al¬ 
though the number of processors on a 
single-bus-based shared-memory multi¬ 
processor is limited by the bus bandwidth, 
large caches with efficient coherence and 
bus protocols allow scaling to a moderate 
number of processors (for example, 30 on 
Sequent’s Symmetry). 

Bus-based shared-memory systems use 


the bus as a broadcast medium to maintain 
coherency; all the processors “snoop” on 
the bus to maintain coherent information in 
the caches. The protocols require the data 
in other caches to be invalidated or updated 
on a write by a processor if multiple copies 
of the modified data exist. The bus pro¬ 
vides free broadcast capability, but this 
feature also limits its bandwidth. 

New coherence schemes that scale be¬ 
yond single-bus-based, shared-memory 
architectures are being proposed now that 
the cost of high-performance interconnec¬ 
tions is dropping. Current research efforts 
include directory-based schemes and mul¬ 
tiple-bus-based schemes. 

Directory-based schemes. Directory- 
based schemes can be classified as central¬ 
ized or distributed. Both categories sup¬ 
port local caches to improve processor 
performance and reduce traffic in the 

0018-9162/90/0600-0071501.00 © 1990 IEEE 


interconnection. The following “coher¬ 
ence properties” form the basis for most of 
these schemes: 

• Sharing readers. Identical copies of a 
block of data may be present in several 
caches. These caches are called readers. 

• Exclusive owners. Only one cache at a 
time may have permission to write to a 
block of data. This cache is called the 
owner. 

• Reader invalidates. Before a cache can 
gain permission to write to a block (that is, 
become the owner), all readers must be 
notified to invalidate their copies. 

• Accounting. For each block address, 
the identity of all readers is somehow 
stored in a memory-resident directory. 

Presence flags. One cache-coherence 
scheme, proposed by Censier and Feau- 
trier 1 in 1978, uses presence flags. In each 
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P = Processor 
C = Cache 
M = Memory 
I/O = Input/output 
C/I = Cluster interface 
N/l = Network interface 


Figure 1. Extension of single-bus architectures to multiple-bus architectures: 
(a) one dimension; (b) two dimensions; (c) hierarchy. 


memory module, every data block is ap¬ 
pended with a single state bit followed by 
one presence flag per cache. Each presence 
flag is set whenever the corresponding 
cache reads the block. As a result, invalida¬ 
tion messages need only be sent to those 
caches whose presence bits are set. 

In the presence-flag solution, unfortu¬ 
nately, the size of each memory tag grows 
linearly with the number of caches, mak¬ 
ing the scheme unscalable. The tag will be 
at least N bits, where N is the number of 
caches in the system. There may also be 
other bits stored in the directory to indicate 
line state. 

Variations of this scheme use a broad¬ 
cast mechanism to reduce the number of 
bits required in the directories. However, 
this introduces extra traffic in the intercon¬ 
nection and may degrade system perform¬ 
ance. 

In the central-directory scheme with 


presence flags, a cache miss is serviced by 
checking the directory to see if the block is 
dirty in another cache. When necessary, 
consistency is maintained by copying the 
dirty block back to the memory before 
supplying the data. The reply is thus serial¬ 
ized through the main memory. To ensure 
correct operation, the directory controller 
must lock the memory line until the write¬ 
back signal is received from the cache with 
the dirty block. Write misses generate 
additional invalidate messages for all 
caches that have clean copies of the data. 
Invalidate-acknowledgments must be re¬ 
ceived before a reply can be sent to the 
requesting cache. Note that the relevant 
line is locked while this is being done. 
Requests that arrive while a line is locked 
must be either buffered at the directory or 
bounced back to the source to be reissued 
at a later time. This may cause a loss in 
performance. 


The performance of presence-flag 
schemes is limited by conflicts in access¬ 
ing the main memory directory. The main 
memory and the tags can be distributed to 
improve the main memory’s performance. 
However, the serialization of responses 
through the main memory and the locking 
of lines by the directory controller affect 
the performance of these cache-coherence 
schemes. 

B pointers. Another alternative, being 
pursued by Agarwal et al. 2 and by Weber I 

and Gupta, 3 requires each block to have a 
smaller array of B pointers instead of the 
large array of presence bits. Some studies 
of application programs 2 ' 4 suggest that, 
because of the parallel programming 
model used, a low value for B (perhaps 1 or 
2) might be sufficient. Since each shared 
data structure is protected by a synchroni¬ 
zation variable (lock), only the lock — not 
the shared data structure — is heavily 
contested in medium- to large-grain paral¬ 
lel applications. Thus, the shared data only 
moves from one cache to another during 
computation, and only synchronization 
can cause invalidation in multiple caches. 

If B is small and the data is heavily shared, 
processors can thrash on the heavily shared 
data blocks. If B is large, the memory 
requirements are worse than for the pres¬ 
ence-flag method. 

Linked lists. Note that the presence-flag 
solution uses a low-level data structure 
(Boolean flags) to store the readers of a 
block, while the B-pointers method saves 
them in a higher level structure (a fixed 
array). Perhaps more-flexible data struc¬ 
tures, such as linked lists, can be applied to 
the problem of cache coherence. Distribut¬ 
ing the directory updates among multiple 
processors, rather than a central directory, 
could reduce memory bottlenecks in large 
multiprocessor systems. 

A few groups have proposed cache- 
coherence protocols based on a linked list 
of caches. Adding a cache to (or removing 
it from) the shared list proceeds in a man¬ 
ner similar to software linked-list modifi¬ 
cation. Groups using this approach include 
the IEEE Scalable Coherent Interface 
(SCI) standard project, a group at the 
University of Wisconsin-Madison, and 
Digital Equipment Corporation in its work 
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with Stanford University’s Knowledge 
Systems Laboratory. The SCI work, which 
is the most defined of the three, is covered 
in the report beginning on p. 74. Some 
features of the Stanford Distributed Direc¬ 
tory (SDD) Protocol are outlined on pp. 
78-80. 

Bus-based schemes. Bus-based systems 
provide uniform memory access to all 
processors. This memory organization al¬ 
lows a simpler programming model, mak¬ 
ing it easier to develop new parallel appli¬ 
cations or to move existing applications 
from a uniprocessor to a parallel system. 
Since the bus transfers all data within the 
system, it is the key to performance — and 
a potential bottleneck — in all bus-based 
systems. 

Several architectural variations of bus- 
based systems have been proposed. Below, 
we describe two types — multiple-bus and 
hierarchical architectures. (See Figure 1.) 
One of these, the Aquarius multiple-bus 
multiprocessor architecture, is described 
in more detail in the report beginning 
on p. 80. 

Multiple-bus systems. An obvious way 
to extend the single-bus system is to in¬ 
crease the buses. Studies by Hopper et al. 5 
show that a system with multiple buses can 
provide higher bandwidth and perform¬ 
ance than a system with wider buses. 
Multiple buses provide redundancy and 
extra bandwidth. 

A simple extension, splitting a single 
bus into address and data buses, has a 
limited performance gain. The next choice 
is to duplicate buses. This scheme scales 
the bandwidth linearly with the memory. 
Synapse 6 had two buses, but they were 
used for redundancy rather than bandwidth 
(Figure la). (Arbitration and the coher¬ 
ence protocols become complex with 
multiple buses.) 

The Wisconsin Multicube architecture 7 
uses a grid of buses connecting the proces¬ 
sors to memory. A processor resides at 
each crosspoint on the grid. The topology 
allows the system to scale to 1,024 proces¬ 
sors. Each processor has a second-level 
cache that snoops on both the vertical and 
horizontal buses. (See Figure lb.) 

The Aquarius architecture is based on 
the same topology as the Wisconsin Mul¬ 


ticube. However, it differs in the cache- 
coherence mechanism and in the distribu¬ 
tion of memory modules. The system uses 
a combination of a snoopy cache-coher¬ 
ence protocol and a directory-based co¬ 
herence protocol to provide coherency in 
the system. The memory is distributed per 
node, unlike the Wisconsin Multicube. 

Hierarchical systems. In hierarchical 
systems, clusters of processors are con¬ 
nected by a bus or an interconnection net¬ 
work. (See Figure lc.) In the three ex¬ 
amples below, the intercluster connection 
is a bus, and the processors within a cluster 
are also connected via the bus. This is 
similar to a single-bus system. 

Wilson 8 uses a simulation and an ana¬ 
lytical model to examine the performance 
of a hierarchically connected multiple- 
bus design. The design explores a uniform 
memory architecture with global memory 
at the highest level. It uses hierarchical 
caches to reduce bus use at various levels 
and to expand cache-coherency tech¬ 
niques beyond those of a single-bus sys¬ 
tem. The performance study showed that 
degradation resulting from cache coher¬ 
ency is minimal for a large system. 

The Diffusion Machine 9 is a scalable 
shared-memory system in which a hierar¬ 
chy of buses and data controllers link an 
arbitrary number of processors, each hav¬ 
ing a large set-associative memory. Each 
data controller has a set-associative direc¬ 
tory containing the state information for 
its data values. The controller supports re¬ 
mote accesses by snooping on the next bus 
in the hierarchy in both directions. A 
cache-coherence protocol enables data 
migration, duplication, and replacement. 

The VMP-MC Multiprocessor 10 is an¬ 
other hierarchically connected multiple- 
bus multiprocessor system. The first-level 
cluster comprises processor-cache pairs 
connected by a bus and is similar to a 
single-bus system. However, instead of 
main memory, the bus has an interbus 
cache module that interfaces to the next 
bus in the hierarchy. This second-level 
bus has the main memory. Again, this 
system provides a uniform memory ac¬ 
cess. In this system, larger clusters are 
connected via a ring-based system to pro¬ 
vide a large, distributed shared-memory 
system. ■ 
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The Scalable Coherent Interface is a 
local or extended computer “backplane” 
interface being defined by an IEEE stan¬ 
dard project (PI596). The interconnection 
is scalable, meaning that up to 64K proces¬ 
sor, memory, or I/O nodes can effectively 
interface to a shared SCI interconnection. 

The SCI committee set high-perform¬ 
ance design goals of one gigabyte per sec¬ 
ond per node. As a result, bused backplanes 
have been replaced by unidirectional 


Project status and information sources 


Scalable Coherent Interface. 

Simulation of the coherence protocols 
is now under way at the University of 
Oslo and Dolpin Server Technology in 
Oslo, Norway. The initial SCI simula¬ 
tion efforts focus on proving the 
specification’s correctness rather 
than calibrating its performance. 

Three University of Oslo research¬ 
ers (Stein Gjessing, Ellen Munthe- 
Kaas, and Stein Krogdahl) are for¬ 
mally specifying the intent of the 
cache-coherence protocol and ver¬ 
ifying that the cache updates pre¬ 
scribed by the SCI standard are spe¬ 
cified correctly. 

The University of Wisconsin’s multi¬ 
cube group is now working with the 
SCI group. 

IEEE’s SCI-P1596 working group 
plans to freeze the base coherence 
protocols by this summer. The group 
will continue to explore optional co¬ 
herence extensions to improve the 
performance of frequently occurring 
sharing-list updates. If you have inter¬ 
ests in this area, please contact the 
SCI-P1596 working group chair: 

David B. Gustavson, Computation 
Research Group, Stanford Linear 
Accelerator Center, PO Box 4349, Bin 


88, Stanford, CA 94309. Gustavson’s 
phone number is (415) 926-2863, his 
fax number is (415) 961-3530 or 926- 
3329, and his e-mail address is 
dbg@slacvm.bitnet. 

Stanford Distributed Directory. 

A group at Stanford University’s 
Knowledge Systems Laboratory is 
working on simulations to determine 
the performance of their distributed- 
directory scheme using linked lists. 
Further information can be obtained 
from Manu Thapar, Knowledge Sys¬ 
tems Laboratory, Department of 
Computer Science, Stanford Univer¬ 
sity, 701 Welch Road, Palo Alto, CA 
94304. Thapar’s phone number is 
(415) 725-3849; his e-mail address is 
manu@ksl.stanford.edu. 

Aquarius. The Aquarius group is 
evaluating the multi-multi architecture 
by simulation. Further information on 
that project can be obtained from Mi¬ 
chael Carlton, University of California 
at Berkeley, Division of Computer Sci¬ 
ence, 571 Evans Hall, Berkeley, CA 
94720. His phone number is (415) 
642-8299, and his e-mail address is 
carlton@ernie.berkeley.edu. 


point-to-point links. One set of input sig¬ 
nals and one set of output signals are de¬ 
fined. Packets are sent to the interconnec¬ 
tion through the output link, and packets 
are returned to the node on the input link. 

Although SCI only defines the interface 
between nodes and the external intercon¬ 
nection, the protocol is being validated on 
the least expensive and highest perform¬ 
ance interconnection topologies, as illus¬ 
trated in Figure 1. 

To support arbitrary interconnections, 
the committee abandoned the concept of 
broadcast transactions or eavesdropping 
third parties. Broadcasts are “nearly im¬ 
possible” to route efficiently, according to 
experienced switch designers, and are also 
hard to make reliable. Because of its large 
number of nodes and resulting high cumu¬ 
lative error rate, reliability and fault recov¬ 
ery are primary objectives of SCI. There¬ 
fore, its cache-coherence protocols are 
based on directed point-to-point transac¬ 
tions, initiated by a requester (typically the 
processor) and completed by a responder 
(typically a memory controller or another 
processor). 

Sharing-list structures. The SCI co¬ 
herence protocols are based on distributed 
directories. Each coherently cached block 
is entered into a list of processors sharing 
the block. Processors have the option to 
bypass the coherence protocols for locally 
cached data, as illustrated in Figure 2. 

For every block address, the memory 
and cache entries have additional tag bits. 
Part of the memory tag identifies the first 
processor in the sharing list (called the 
head); part of each cache tag identifies the 
previous and following sharing-list en¬ 
tries. For a 64-byte cache block, the tags 
increase the size of memory and cache 
entries by four and seven percent, respec¬ 
tively, compared to the traditional eaves- 
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dropping alternatives. However, snoopy 
protocols have other hidden costs; they 
require high-performance dual-ported 
cache-tag memories to allow execution of 
processor instructions while eavesdrop¬ 
ping on other bus activity' 

Sharing-list additions. Initially, mem¬ 
ory is in the uncached state and cached 
copies are invalid. A read-cached transac¬ 
tion is directed from the processor to the 
memory controller. This changes the 
memory state from uncached to cached and 
returns the requested data. The data is re¬ 
turned and the requester’s cache-entry 
state is changed from the invalid to the 
head state. 

For subsequent accesses, the memory 
state is cached, and the head of the sharing 
list has the (possibly dirty) data. A new 
requester (Cache A) directs its read-cached 
transaction to memory, but receives a 
pointer to Cache B instead of the requested 
data. A second cache-to-cache transaction, 
called prepend, is directed from Cache A to 
Cache B. On receiving the request. Cache 
B sets its backward pointer to point to 
Cache A and returns the requested data, as 
illustrated in Figure 3. 

The dotted arrow in Figure 3 illustrates 
a transaction directed between the proces¬ 
sor (the requester) and memory or another 
processor (the responder). The solid line 
illustrates the sharing-list pointers. Note 
that memory cannot always forward the 
request directly to Cache B — that would 
create potential deadlocking dependen- 
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Interconnection options 


Figure 1. Scalable Coherent Interface (SCI) interconnection models. 



Figure 2. Distributed cache tags. 



Figure 3. Sharing-list additions. 


Unlike the central-directory schemes, 
request transactions are never blocked at 
the memory controller; instead, all re¬ 
quests are immediately prepended to the 
head of the existing sharing list. Requests 
are added in FIFO order, as defined by the 
arrival of coherent requests at the memory 
controller. 

Sharing-list removals. The head of the 
list has the authority to purge other entries 
to obtain an exclusive (and therefore 
modifiable) entry. The initial transaction 
to the second sharing-list entry purges that 
entry from the sharing list and returns its 
forward pointer. The forward pointer is 
used to purge the next (previously the 
third) sharing-list entry. The process con¬ 



Figure 4. Head purging other entries. 


tinues until the tail entry is reached, as 
illustrated in Figure 4. As an option, the 
purges can be forwarded directly through 


the sharing-list entries. 

Note that purge latencies increase line¬ 
arly with the number of sharing readers. 
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(2) Get-copy 



Figure 5. Optimized direct-memory-access reads. 



Figure 6. Request combining. 


Since purge list sizes are often small, the 
linear latencies may be acceptable in many 
configurations. 

Entries can also delete themselves from 
the list when they are needed to cache other 
block addresses. Since the linked list is 
distributed and doubly linked, multiple 
entries can delete themselves simultane¬ 
ously. Special precedence rules are ap¬ 
plied to avoid corruption of pointers when 


adjacent deletions are initiated simultane¬ 
ously. To ensure forward progress, the 
entry closest to the tail has priority and is 
deleted first. 

Standard optimizations. The basic 
coherence-protocol operations have been 
optimized to improve the performance of 
frequent events. We are considering other, 
more complex optimizations to improve 


the performance of large system configu¬ 
rations. These compatible extensions to 
the basic coherence protocols will be in¬ 
cluded as part of the SCI standard. 

An optimized direct-memory-access 
controller generates read-check transac¬ 
tions to fetch its data from memory. If the 
addressed location is clean, the data is 
returned directly from memory; otherwise, 
the processor is redirected to the current 
sharing-list head. Thus, the DMA control¬ 
ler can fetch its data without joining the 
sharing list, as illustrated in Figure 5. 

The frequent one-writer/one-reader 
(producer/consumer) form of data sharing 
is optimized. The invalidation of the writer 
(head) and the data fetches of the reader 
(tail) are both performed as direct cache- 
to-cache transactions between the head and 
tail of an established sharing list. 

Request combining. One useful feature 
of linked-list coherence is the possibility 
of combining list-insertion requests in the 
interconnection to eliminate hot spots at or 
near heavily shared memory controllers. 
Such hot spots degrade performance not 
only of the requesting processor but also of 
other transactions that share portions of the 
congested connection path. 

While queued in an active switch buffer, 
two requests to the same physical memory 
address (read A and read B) can be com¬ 
bined. The combining generates one re¬ 
sponse (status A), which is immediately 
returned to one of the requesters, and one 
modified request (read A-B), which is 
routed towards memory. Additional re¬ 
quests (read C) can also be combined with 
the modified request, as illustrated in Fig¬ 
ure 6. 

Read transactions and add transactions 
(add to previous value) can be combined in 
the interconnection or at the memory con¬ 
troller’s front end. Coherent-request com¬ 
bining is simpler than noncoherent fetch- 
and-add combining,' since state need not 
be saved in the interconnection while the 
modified request is being forwarded to 
memory. 

SCI’s optional extensions. The latency 
of distributing data or purges to large 
numbers of readers currently scales line¬ 
arly with the number of read-sharing pro¬ 
cessors. We are investigating the use of 
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Figure 7. Redundant sharing-list pointers. 


redundant pointers to reduce these linear 
delays to logarithmic latencies (order 
log(A0, where N is the number of read- 
sharing processors). The redundant point¬ 
ers can be created while the request com¬ 
bining is being performed, to provide the 
binary tree-like structure illustrated in 
Figure 7. 

The redundant pointers could be used by 
multiple readers, to request early copies of 
heavily shared data, or by a writer, to 
quickly purge stale copies when a new data 
value is written. 

Synchronization. In shared-memory 
architectures, locks are the primary form 
of synchronization for large-scale multi¬ 
processors and must be handled effi¬ 
ciently. The SCI options include efficient 
synchronization primitives for large-scale 
multiprocessors. A queued-on-lock-bit 
idea, described by Goodman, Vernon, and 
Woest, 2 provides FIFO access to synchro¬ 
nization variables. Since linked cache en¬ 
tries form a queue, little additional hard¬ 
ware is needed to implement an SCI vari¬ 
ant of this scheme. The advantage of the 
queued-lock scheme is that (except for 
replacements) lock requests are serviced in 
FIFO order and only 0(N) transactions are 
generated. ■ 
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Distributed-Directory Scheme: 

Stanford Distributed-Directory 
Protocol 

Manu Thapar and Bruce Delagi, 

Digital Equipment Corporation and Stanford University 


The Stanford Distributed-Directory 
(SDD) coherence protocol is based on a 
singly linked list of distributed directories. 
Sharing-list additions and removals are 
handled differently than in the Scalable 
Coherent Interface (SCI) protocol. Figure 1 
shows the operations used to add a cache to 
the list. A shaded arrow depicts a single 
message between two nodes. (Note the 
distinction from the dotted arrow used in 
the preceding SCI report to indicate a trans¬ 
action involving two messages, a request 
and a reply.) 

Reads. On a read miss, a new requester 
(Cache C) sends a read-miss message to 
memory as shown in Figure 1. The memory 
updates its head pointer to point to the 
requestor and sends a read-miss-forward 
signal to the old head (Cache B). On receiv¬ 
ing the request, Cache B returns the re¬ 
quested data along with its address as a 
read-miss-reply. When the reply is received 
at Cache C, the data is copied and the 
pointer is made to point to Cache B. Read 
misses thus result in three messages instead 
of four as in the case of the SCI protocol. 
The three-message scheme is not always 
safe to use; when queues are full, the SCI 
four-message scheme must still be sup¬ 
ported to avoid message-queue deadlocks. 

Writes. For writes, write buffering along 
with weak ordering 1 lets the processor 
proceed immediately without stalling. A 
write is considered issued when the cache 
sends a write miss. A write is considered 
performed when the caches receives a 
write-miss-reply. On a write miss, a re¬ 
questor (Cache D) sends a write-miss mes¬ 
sage to memory, as shown in Figure 2. The 
memory updates its head pointer to point to 
the requester and sends a write-miss-for- 
ward signal to the old head (Cache C), 
Cache C invalidates its copy and sends a 


write-miss-forward signal to the next 
cache in the list (Cache B). Cache C also 
sends the data to Cache D as a write-miss- 
reply-data signal. When Cache B receives 
the write-miss-forward signal, it invali¬ 
dates its copy and sends a write-miss-for¬ 
ward signal to the next-cache (Cache A). 
When the write-miss-forward signal is 
received by the tail (Cache A) or by a cache 
that does not have a copy of the line — a 
case that may happen on replacement — 
that cache sends a write-miss-reply to the 
requestor. The write is considered per¬ 
formed when the requestor has received 
both the write-miss-reply-data and the 
write-miss-reply. Write misses result in 
about half the messages required for the 
base SCI protocol. 

Write misses in caches that are part of 
the shared list are handled similarly and are 
described in detail in an earlier report. 2 

The latency of write misses may be a 
cause for concern, since caches linked in 
the list must be invalidated sequentially. 
However, if writes to a line occur fre¬ 
quently, the number of caches to be invali¬ 
dated between writes will be small. Thus, 
cases when the latency is large will be 
infrequent. Additionally, write buffering 1 
is used to reduce the effect of the sequen¬ 
tial invalidation operations. 

Pending signals. A cache line would be 
in the writing-or-reading state after gener¬ 
ating a read miss or a write miss and before 
receiving a read-miss-reply or a write- 
miss-reply. If the cache line is in the writ¬ 
ing-or-reading state and receives a read- 
miss-forward or a write-miss-forward sig¬ 
nal, the forwarded signal is stored in the 
line’s cache-pointer field. The state is 
changed to note that a forwarded signal has 
been stored. These stored signals, called 
pending signals, are serviced when the 
reply to the local read or write miss is 


received. If multiple transactions are pend¬ 
ing for the same line, the caches form a 
distributed queue of pending signals. The 
requests are thus serviced in a pipelined 
manner that eliminates the directory con¬ 
tention of a centralized-directory protocol. 

The naive forwarding of request signals 
can result in deadlocks. To avoid such 
deadlocks, forwarding does not occur un¬ 
der certain conditions; instead, a reply is 
generated as in the SCI protocol. 

Replacement. Replacement of lines 
that are linked in a list is handled by invali¬ 
dating the lower part of the list. A doubly 
linked list, as used by the SCI protocol, 
may be used to “patch” the list in case of re¬ 
placements. However, in practice, per¬ 
formance improvement depends on the list 
lengths and access patterns. The improve¬ 
ment will probably be small, especially if 
replacement of shared writable data is in¬ 
frequent. Some optimizations may be 
made to improve performance of the SDD 
protocol where long lists are expected. For 
example, in the case of read-only data, it’s 
unnecessary to form a linked list of caches 
that contain shared copies. Replacement 
can be done directly, involving only a local 
operation. Page replacement of read-only 
data can be handled by software-based 
mechanisms. 

Synchronization. A distributed-direc¬ 
tory cache-coherence protocol allows effi¬ 
cient implementation of locks at minimal 
extra cost. Familiar microprocessor archi¬ 
tectures have some form of atomic test- 
and-set instruction to implement spin 
locks. 3 The test-and-set instruction sets the 
value of a memory location and atomically 
returns the old value. When a process 
wants access to a lock, the processor per¬ 
forms the test-and-set instruction. If the 
operation is successful, the processor con¬ 
tinues; otherwise, the processor repeatedly 
tries to access the lock until it is successful. 

Spinning on a test-and-set instruction 
can cause a lot of network traffic each time 
a lock is released. A better alternative is a 
test-test-and-set sequence, where the first 
test is done in the cache and the test-and- 
set only if the first test is successful. This 
will reduce network traffic, but network 
traffic due to lock acquisition will still be 
0(AF), where N equals the number of pro- 
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Figure 1. Sharing-list additions for the Stanford Distributed-Directory (SDD) 
protocol. 



Figure 2. Sharing-list removals for the SDD protocol. 


cessors contending for the lock. Further, 

I such lock implementations can result in 
I starvation, since the accessing of locks is 
not fair (perhaps due to differences in the 
network distance between a lock and its 
contenders). 

The SDD protocol allows a lock im¬ 
plementation that minimizes network traf- 
[ fic. Lock requests are queued and normally 

! serviced in FIFO order. The network traf¬ 

fic is only 0(1V). Starvation is eliminated if 
there is one process per processor. 

Fine-grain locking is provided by hav¬ 
ing a lock state per cache line. 4 The main 
advantage of such a scheme is that the data 
can be obtained at the same time as the 
j lock. 

The code for a lock procedure is 

procedure lock(var) 
begin 

Queue-Lock(var) 
while Test&Set(var) 
Queue-Lock(var) 
end 

A distributed queue of caches waiting 
for a lock is formed by a mechanism simi¬ 
lar to the one used to form a distributed 
queue of pending signals. The implemen¬ 
tation of locks requires a few extra states 
and signals. 

When a queue-lock instruction is exe¬ 
cuted, a line in the lock-in-progress state is 
' allocated, and a lock-miss signal is sent to 
the directory. If no other cache has locked 
the line, the directory sends a lock-granted 
signal along with the data. The first cache 
that receives the lock-granted signal is 
considered the upstream end. Otherwise, 
the directory updates its cache pointer to 
point to the requesting cache and forwards 
the lock-miss signal to the cache previ¬ 
ously pointed to by the cache pointer. The 
cache receiving this forwarded signal 
stores the address of the requesting cache 
(the downstream pointer) in its cache- 
pointer field. A set-upstream-pointer sig¬ 
nal with the old value of the cache pointer 
is also sent to the requesting cache. The last 
requesting cache is considered the down¬ 
stream end. The directory points to the 
downstream end. The requesting cache 
stores this upstream pointer in its data 
field. In this way, a doubly linked list of 
caches waiting for a lock is formed. 

Lines in the lock-in-progress state do 


not have valid data, so these lines can store 
the upstream pointers for the doubly linked 
list without requiring extra memory. The 
lock-in-progress state has a few sister 
states used to ensure receipt of the signals 
required to form the doubly linked list. 

The test-and-set instruction returns a 
value of “true” until the lock is obtained at 
the local cache. A queue-lock instruction 
may be issued more than once. If the asso¬ 
ciated line has already been allocated in the 
cache, the instruction is redundant and has 
no effect. The queue-lock instruction in the 
while loop rejoins the queue of nodes 
waiting for a lock in case of replacements. 

More than one processor may share a 
cache, so there may be multiple processes 
per processor contending for the same 
lock. In that case, each process executes 
the lock procedure. 

The first queue-lock request to reach the 
cache causes a line in the cache to be linked 
to the doubly linked list of caches waiting 
for a lock. The processes associated with a 
cache contend for the lock by checking the 
cache locally, without generating any net¬ 
work traffic. 

When a lock-granted signal is received, 
a process has to obtain and release the lock 
to allow other caches waiting for the lock 
to get it. If no other cache wants the lock, 
the line is held locally. Processes that share 
the cache can now lock and unlock the line 
very efficiently in the cache without re¬ 


quiring any network traffic. 

After the lock has been used and released 
once locally, and if a lock-miss-forward 
signal is received, the lock is granted to the 
next downstream cache in the queue. This 
ensures fairness between processors. In the 
SDD protocol, the grant signal does not 
have to go through the directory, and the 
data is passed between the caches. The 
doubly linked list allows lock-grant signals 
to flow upstream. This is sometimes re¬ 
quired for process migration. 

The advantages of this scheme are that it 
(1) services lock requests, except for re¬ 
placements, in FIFO order; (2) eliminates 
starvation, if there is one process per cache; 
and (3) requires only 0(A) operations. A 
similar scheme has been proposed by 
Goodman, Vernon, and Woest. 5 However, 
their original proposal required broadcast 
transactions and interactions with the main 
memory, which would have generated 
more network traffic. 

For details on the SDD cache-coherence 
protocol, see our earlier report. 2 ■ 
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Multiple-Bus Shared-Memory System: 

Aquarius Project 


Michael Carlton, University of California at Berkeley 
Alvin Despain, University of Southern California 


Multi-multi architecture. The Aquar¬ 
ius project at the University of California 
at Berkeley has been investigating multi¬ 
processor systems composed of multiple 
buses, with the buses connected via differ¬ 
ent interconnection networks. The sim¬ 
plicity and efficiency of a single-bus multi¬ 
processor' first led Goodman and Woest 2 
to consider it as a building block for the 
Wisconsin Multicube project’s large 


multiprocessors (that is, much larger than 
30 processors). Goodman’s work, in turn, 
led the Aquarius group to develop and 
investigate the architecture described 
here, which we call a multi-multi. 

Figure 1 shows the multiple-bus system 
architecture. It has several architectural 
features in common with the Wisconsin 
Multicube project, but it differs noticeably 
in the methods used to provide cache co¬ 


herency and in the distribution of memory 
modules. The example in-Figure 1 contains 
only two dimensions, but the architecture 
is designed to handle several dimensions 
with a moderate number of processors per 
bus. It provides scaling to a large number 
of processors in a system. For example, a 
three-dimensional system with eight proc¬ 
essors per bus would contain 512 proces¬ 
sors. 
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Figure 1. The multiple-bus system architecture. 


A key characteristic of the architecture 
is the amount of bandwidth it provides. If 
the number of dimensions is d and the 
number of processors per bus is p, then the 
number of processors in the system is n = 
p d . The number of buses (and hence total 
bandwidth) is dn/p, so the amount of band¬ 
width available per processor is dip. This 
shows that as the dimensionality of the 
system increases, so does the amount of 
bandwidth available. Increasing the di¬ 
mensionality also increases the maximum 
number of bus broadcasts required for a 
transaction to travel between processors. 
In the worst case, where every broadcast 
requires d steps, the effective bandwidth 
per processor is constant at 1 Ip. Providing 
greater bandwidth as the number of pro¬ 
cessors increases supports the goal of de¬ 
signing a scalable system. 

Another distinctive feature is that mem¬ 
ory modules are assigned to nodes rather 
than attached to buses. All memory ap¬ 
pears in the global shared-address space. 
The memory is interleaved on high bits so 
as to assign a large section of the address 
space to each of the physically local 
memories. This arrangement provides effi¬ 
cient support for private data and a natural 
division of directory information. Thus, 
each node can, if desired, run completely 
independently of all other nodes, or it can 
share to any desired degree. Sharing is 


most efficient among processors that share 
a bus but is possible among all processors 
in the system. 

Nodes. Each node in the architecture 
contains a microprocessor, memory, and a 
cache. The processor is a high-perform¬ 
ance microprocessor. The local memory is 
large, preferably at least four megabytes, 
and is tagged with directory information 
on a per-block basis. The caches are fairly 
large (128 kilobytes each for both the data 
and instruction parts). They consist of the 
data store and a multiple-ported tag store, 
with ports for the processor and each exter¬ 
nal bus the node attaches to. 

Multi-multi protocol. The cache-co¬ 
herence protocol for the multi-multi archi¬ 
tecture combines features of snooping 
cache schemes, to provide consistency on 
individual buses, and features of directory 
schemes, to provide consistency between 
buses. The snooping cache component can 
take advantage of the low-latency commu¬ 
nication possible on shared buses for effi¬ 
ciency, yet the complete protocol will 
support many more processors than a 
single bus can. The resulting protocol natu¬ 
rally extends cache coherence from a 
multi to a multi-multi. 

The protocol uses split transactions for 
certain coherency actions. If a given coher¬ 


ency action cannot be completed in a single 
bus cycle (normally because insufficient 
information is available on that bus), it will 
be split, that is, some cache on the bus will 
relay the transaction to another bus. The 
cache that originated the transaction will 
enter a pending state until it receives a 
completion transaction. 

The unit of coherence is a cache block, 
that is, the protocol handles cache blocks 
as indivisible items. Each node in the sys¬ 
tem is iderttified by the address range 
contained in its local memory. Each block 
is logically associated with the node whose 
memory contains the block. This node is 
called the root node for that block. The root 
node contains the directory information 
for the block and serves as a synchroniza¬ 
tion point for concurrent accesses. 

Cache and directory states. Table 1 
describes the basic cache states. 

The shared states indicate that other 
caches in the system may have a copy of 
the same block. The local state guarantees 
that any other copies of the block will be on 
only the local bus. (See the section on local 
sharing.) 

The exclusive unmodified state is used 
only for blocks at their root node to support 
efficient access to private data. The locked 
state, an extension of the cache-lock-state 
protocol described by Bitar and Despain, 3 
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Table 1. Cache states. 


Cache State 

Definition 

Invalid 

Not valid 

Shared-unmodified 

Unmodified copy, possibly shared 

Local-shared-unmodified 

Unmodified copy, possibly shared on local bus only 

Exclusive-modified 

Modified copy, exclusively owned 

Exclusive-unmodified 

Unmodified copy, exclusively owned 

Locked 

Locked copy, exclusively owned 


Table 2. Directory states. 


Directory State 

Definition 

Uncached 

No cached copies exist 

Shared-unmodified 

Unmodified, shared copies exist 

Local-shared-unmodified Unmodified, only locally shared copies exist 

Exclusive-modified 

A modified, exclusive copy exists 

Exclusive-root 

A copy (possibly modified) exists at the root only 

Locked 

A locked copy exists 


supports synchronization primitives. A 
processor locks a block when it needs 
exclusive access to the block; a block in 
locked state cannot be accessed by other 
caches. The processor then unlocks the 
block when it is through modifying the 
block. The implementation of the cache 
also uses a few other states to indicate that 
the cache is arbitrating for the bus or that it 
has a read or write access pending after a 
transaction has been split. 

The directory is distributed with the 
individual memories. The directory main¬ 
tains information about the state of each 
block, as well as the location of a single bus 
on which the block resides. This informa¬ 
tion is used when the block is cached by 
one node for write access or is shared 
locally among nodes on a single bus; in 
these cases, bus transactions can be di¬ 
rected to only that bus. 

Table 2 describes the basic directory 
states. The shared states allow for any 
number of cached copies, including zero. 
The exclusive root state does not indicate if 
the block has been written or not. This 
allows a root write to an exclusive-un¬ 


modified block to change state to exclu¬ 
sive-modified without updating the direc¬ 
tory or requiring a bus transaction (either 
of which would require stalling the proces¬ 
sor). 

Concepts. The multi-multi protocol 
contains several important concepts that 
enable efficient performance. 

Local sharing. The amount of sharing 
among nodes significantly affects system 
performance. Fortunately, the number of 
shared copies of a block tends to be low in 
practice. 4 This leads to the observation that 
if the blocks are shared among the nodes of 
a single bus, then the protocol can be opti¬ 
mized for this case. This is referred to as 
local sharing. 

The multi-multi protocol takes advan¬ 
tage of local sharing by observing when a 
block is shared only locally and by per¬ 
forming bus transactions on that bus only. 
For example, a write request by a processor 
needs only a single transaction to invali¬ 
date all copies of a block known to be 
shared locally. When a block is shared, but 
not known to be shared locally, a global 


invalidation must be used to invalidate all 
shared copies. 

To benefit from local sharing, the proto¬ 
col requires the local-shared-unmodified 
cache and directory states. Read requests 
are satisfied on the local bus when possible 
and indicate that the cached copy can go to 
the local shared state if this is so. 

Root node. The exclusive-unmodified 
cache state for blocks on the root node 
allows reads and writes to proceed without 
bus transactions. This is critical for private 
data. The exclusive-unmodified state is 
not used for blocks cached on other nodes 
because of the need to inform the directory 
when the block is written and the cost 
associated with this. 

In practice, a read miss by the root node 
to a block will be filled, and the cache state 
set to exclusive-unmodified. If the process 
then writes to the block, the cache state 
will change to exclusive-modified with¬ 
out taking a cache miss. This allows pri¬ 
vate data accesses to proceed without bus 
transactions. 

Bus addresses in the directory. The 
protocol maintains the address of the bus 
on which a block resides when the block is 
owned exclusively or when it is shared 
locally. By keeping this location, the proto¬ 
col greatly reduces the number of global 
invalidations by directing invalidations to 
this bus instead of falling back on global 
transactions. This reduces global invalida¬ 
tions to only those that are actually neces¬ 
sary and requires only a reasonable amount 
of space overhead in the directory. The 
amount of state required is just log 2 (dn/p) 
bits per block. 

The protocol forces transactions to no¬ 
tify the directory when a block changes 
from being cached on a single bus to being 
cached on multiple buses. ■ 
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Establishing a standard metrics program 

Fletcher J. Buckley 


The July 1989 Standards section iden¬ 
tified an initial set of software metrics 
that had intrinsic worth on the test floor. 
Perhaps the time has now arrived to ex¬ 
tend that work to the establishment of a 
standard metrics program for organiza¬ 
tions. 

First, here’s a word of caution. A num¬ 
ber of nettles are associated with any met¬ 
rics effort. The worst is to attempt to use a 
metrics program to produce data outside 
its range of validity and usefulness. Met¬ 
rics should simply provide indicators of 
where improvements can be made. As an 
example, any metrics effort used as a ba¬ 
sis for personnel actions (favorable or un¬ 
favorable) will quickly lose all validity. 

One way to start the process is to deter¬ 
mine what metrics the organization needs 
(determine the requirements) and con¬ 
centrate on producing them. An alternate 
approach — the one proposed here — is to 
work by trial and error towards a mean- 
ingfuFprogram (rapid prototyping). 

Such a project will have to be done on a 
shoestring since it almost certainly will 
not have been budgeted. The challenge 
will be to generate an immediate product 
that makes sense. Recognizing that there 
are just as many difficulties associated 
with a metrics effort as with any other 
project, certain guidelines should be con¬ 
sidered. 

Guidelines for the initial metrics ef¬ 
fort. The collection effort should be 
minimal, meaning the data to be pro¬ 
cessed should already be in the collection 
phase so that the total metrics effort will 
not be viewed as another burden on the 
shoulders of an already overworked or¬ 
ganization. Consider how the program¬ 
mer will feel when he or she learns that 
there’s another daily or weekly sheet to 
fill out. Apply enough pressure and the 
programmer will complete the form, but 
the validity of the data supplied would be 
highly questionnable. 

The raw metric data must be such that it 
can be processed automatically. Manual 
processing of metric data usually yields 
inaccurate, out-of-date, and expensive- 
to-obtain results. Even worse, from a 
practical standpoint, attempting to pro¬ 
cess metric data using manual methods 


precipitates a resource struggle. Most re¬ 
sources are already scarce, and good hu¬ 
man resources are even more so. If not 
used efficiently, the people will eventu¬ 
ally be reassigned to what are perceived 
to be more immediate tasks, and the met¬ 
rics, effort will slowly die. 

The initial metrics effort should rely on 
computer programs that already exist in 
some basic form. After the earliest prod¬ 
ucts of the effort have been generated, re¬ 
quests for resources to extend the efforts 
will be more favorably received. Re¬ 
questing resources ‘o establish a basic 
capability before the initial products 
have been provided will place the metrics 
effort on a par with many other projects. 
Each project has to compete for resources 
and each will likely promise improve¬ 
ments. Being practical, most upper man¬ 
agement tends to exploit success and dis¬ 
card failure. The best way to be success¬ 
ful is to have a product. 

Finally, and most importantly, the met¬ 
rics must be viewed as worthwhile. 

All of this leads to iteration on what 
metrics are desired and which ones can be 
immediately produced, given the organ¬ 
izational environment. The nettle can be 
grasped at many points. The easiest place 
to start is to look at what data is already 
available. 

Data sources. Two immediate sources 
of raw data are the software configuration 
management library and the cost ac¬ 
counting system. The SCML holds the 
source files (documents and source 
code). The cost accounting system con¬ 
tains the cost estimates made at the start 
of the project and the records of the actual 
costs accumulated as the project pro¬ 
ceeds. 

Documentation metrics. Examining 
the documentation source files, we can 
easily obtain the following: 

(1) Documentation sizes. These can be 
easily measured in words using any stan¬ 
dard word-counting program (for ex¬ 
ample, WC on the VAX/VMS shell facil¬ 
ity.) Metrics can then be provided for in¬ 
dividual documents (the XXX software 
requirements specification (SRS)), for 


individual types of document (the total of 
the sizes for all the SRSs in the project), 
by computer program (the total of all the 
sizes of the individual documents pro¬ 
duced for a specific computer program), 
and by project. 

(2) Documentation changes. These 
are the actual number of changes for each 
document identified above where one 
change is defined as one formal issue of a 
set of change pages to an original docu¬ 
ment. In some organizations, these are is¬ 
sued through the use of specification 
change notices or document change no¬ 
tices. In turn, this data can be provided by 
individual documents (for example, ZZZ 
changes in the XXX SRS). 

(3) Size of documentation changes. 
These consist of the actual word counts 
for the changes for each document identi¬ 
fied above. 

With some further efforts, additional 
insight can be obtained on requirements 
stability. 

Source code metrics. In a manner simi¬ 
lar to documentation metrics, the follow¬ 
ing can be easily obtained: 

(1) The number of source files of code. 
This data can be further grouped by 
source file size (the number of source 
files containing less than 101 executable 
lines of code, the number of source files 
containing 101 or more lines of source 
code but less than 201 source lines of 
code, etc.). This can be further grouped 
by computer program and by project. 

(2) Number of revisions to source 
code files where a revision is defined as 
the issuance of a new version of a source 
code file after the issuance of the initial 
version. This can be grouped by source 
file size using the granularity identified 
above and then by the number of revi¬ 
sions. For example, of 440 source files, 
each of which contained less than 101 
source lines of code, 200 were never re¬ 
vised, 40 were revised once, etc. This, in 
turn, could be used to provide some in- 
house numbers on the validity of the 
speculations that more changes are made 
in larger source files. 

(3) Actual number of source lines of 
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code, grouped by computer program and 
by project. 

All of this assumes that 

(1) the organization has some com¬ 
mon set of standards for source code, 

(2) a standards checking tool (com¬ 
puter program) is used when a source file 
is entered into the SCML, and 

(3) the number of source lines of code 
in a source file is automatically provided 
as part of the standards checking effort. 

Problem!change report metrics. If the 
problem/change reports are kept in auto¬ 
mated form in the SCML or elsewhere, 
metrics are easily derivable. The initial 


A proposed set of 

categories 

for problem causes 

(1) Hardware interface — This is 
specified when there is an interface 
problem between a computer pro¬ 
gram and hardware. 

(2) Software interface — This is 
specified when there is an interface 
problem between two different com¬ 
puter programs. 

(3) Requirements — This is speci¬ 
fied when the resolution of the prob¬ 
lem requires a change in an SRS or in 
a higher level document and not a 
hardware or software interface prob¬ 
lem as specified in (1) or (2) above. 

(4) Design — This is specified 
when the resolution of the problem 
requires a change in the design 
documentation and does not require 
a change in an SRS. 

(5) Code — This is specified when 
the resolution of the problem re¬ 
quires a change in a source file and 
does not require a change in the de¬ 
sign documentation or the SRS. 

This is further subdivided into: 

• Standards violation — This is 
specified when the problem is limited 
to a failure on the part of the source 
file of code to comply with the project 
standards as determined by the 
SCML standards checking tool. 

• Other. 

(6) Invalid — This is specified 
when the problem/change report is 
declared invalid by the software 
manager. 

(7) Other— This is specified when 
the resolution of the problem is not 
placed in one of the above catego- 


Note that each problem/change 
report is placed in one and only one 
category prior to being closed. 


data is the number of open and closed 
problem/change reports, arranged by 
computer program (as previously de¬ 
scribed in the July 1989 article). 

If the data on the problem/change re¬ 
port can be processed, much more insight 
can be obtained. 

Of direct interest is the determination 
of the cause of the problem. If we can find 
some objective evidence of the causes of 
the problems in our shops, then we have a 
valid basis for improvements. (Pro¬ 
nouncements about industrywide prob¬ 
lems have little credibility when it comes 
to establishing projects to provide cures. 
Without an ability to provide hard data on 
local problems, the rest will probably be 
dismissed as idle speculation.) 

The accompanying sidebar shows an 
example set of categories that could be 
superimposed on every problem/change 
report. This data, in turn, can be arranged 
by problem cause, by computer program 
by problem cause, by project by problem 
cause, and so forth. Much more elaborate 
categories can be developed. The ones 
shown in the sidebar are simple, immedi¬ 
ately usable, and capable of providing a 
basis for more elaboration as experience 
is gained. 

Caution should be used, however, in 
using data from the problem/change re¬ 
port (for example, an entry indicating the 
cost or time used to fix the problem). That 
data is notoriously inaccurate and will 
probably differ from the data in the cost 
accounting system. Reconciling this data 
with the cost accounting system data is a 
thankless task, done after the fact and ac¬ 
companied by much pain and anguish. 

Cost metrics. These can be directly de¬ 
rived from the cost accounting system to 
provide the actual project cost measured 
in dollars, ffancs, pounds, yen, etc. With a 
minimal effort, finer grain data can be 
provided on the cost by computer pro¬ 
gram, by development phase, by com¬ 
puter program by development phase, 
and by project. This may take some nego¬ 
tiation with the cost accounting system 
for the entries to be made by phase by 
computer program, and some of the detail 
may have to be left to the next implemen¬ 
tation. However, this enables the organi¬ 
zation to accurately determine how much 
money is really being spent (for example, 
in integration and test). 

Productivity metrics. Increasing the 
productivity of the processes is vital to 
upper management. To do more than 
merely communicate in vague generali¬ 
ties, productivity measures are required. 

Looking at this from a simplistic view¬ 
point, software people produce two 
things: code and documentation. Build¬ 
ing from the previous efforts, we can de¬ 


rive a code productivity metric by divid¬ 
ing the total number of source lines of 
code delivered at the end of the project by 
the sum of all the costs of all the software 
efforts (from the start of the preliminary 
design phase through the completion of 
the computer program testing phase). In a 
similar manner, a documentation produc¬ 
tivity metric can be computed. 

i 

Rework metrics. This has become a 
popular topic in the software field and is 
well-recognized in hardware quality as¬ 
surance. As proposals, consider the fol¬ 
lowing preliminary definitions: 

(1) Requirements rework: This is 
computed by dividing the current sum of 
the SRSs’ documentation size changes 
computed above by the sum of the sizes of 
the SRSs formally established prior to 
beginning the top-level design. 

(2) Interface rework: This is computed 
by dividing the current sum of the inter¬ 
face specification documentation size 
changes computed above by the size of 
the interface documentation formally es¬ 
tablished prior to beginning the top-level 
design. 

(3) Design rework: This is computed 
by dividing the current sum of the design 
documentation changes by the sum of the 
sizes of those documents prior to the start 
of coding. 

(4) Code rework: This is computed by 
dividing the total number of revisions to 
all the source files of code by the total 
number of source files that completed 
their unit tests. 

Other metrics can be developed in a 
similar manner (for example, for test 
documentation). 

Further extensions. In a similar man¬ 
ner, the above efforts can be further ex¬ 
tended to provide indicators on how well 
the cost and schedule estimations are 
made. For example, from the initial cost 
estimates, we can obtain the estimated 
documentation sizes, number of source 
code files, number of source lines of 
code, estimated cost, and so forth. We can 
then develop an appreciation for the ac¬ 
curacy of our initial estimates. 

The bottom line. The ultimate out¬ 
growth of all this is that, assuming we 
have a minimal software configuration 
management operation and a reasonable 
cost-accounting system, we can piggy¬ 
back on those efforts to obtain meaning¬ 
ful management metric indicators at little 
or no cost. From those indicators, we can 
gain significant insight on where to apply 
further effort to make substantial im¬ 
provements in our operations. Why 
aren’t we doing so now? 
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Fields out as DARPA head, perhaps not by choice 


Steve H. Wilcox, Staff Editor 

Craig Fields has left his post as head of 
the Defense Advanced Research Projects 
Agency amid reports the Pentagon forced 
him to take another position because of 
his support for technologies with both ci¬ 
vilian and military applications. 

Fields ran afoul of Bush administration 
officials who opposed the idea of funding 
specific technologies and industries 
while ignoring others, according to un¬ 
named DARPA officials quoted by the 
New York Times, Associated Press, and 
several other news services. DARPA’s 
recent decision to invest $4 million in Ga¬ 
zelle Microcircuits, an electronics firm 
working on gallium arsenide products, 
was a major factor in the Pentagon deci¬ 
sion, according to the published reports. 

The Defense Dept, has maintained that 
Fields voluntarily accepted his new posi¬ 
tion as deputy director of the Defense Re¬ 
search and Engineering Organization, 
which gives him responsibility for 
streamlining the Defense Dept.’s re¬ 
search and development system. 

Fields has not commented on his new 
post nor on the circumstances of his leav¬ 


West gives East the boot 

Five computing heavyweights from 
the West Coast knew which state’s cave 
formations inspired the Colossal Cave in 
the computer game Adventure. That bit of 
obscure knowledge led them to victory 
over five computing giants from the East 
Coast in the second annual Computer 
Bowl, held April 27 at Boston’s World 
Trade Center. 

The bowl pits two teams of experts in a 
“game show” that tests their knowledge 
of computer history, business, folklore, 
and trivia. Questions included: What is 
the name of the computer that belongs to 
Bloom County comic strip character Ol¬ 
iver Wendell Jones? and What was the 
first major movie to use computer-aided 
animation? 

The event, which is now tied at 1-1 (the 
East Coast team won the first bowl last 


ing the DARPA position. DARPA’s for¬ 
mer Deputy Director Victor Reis is acting 
director until a new director is named. 

Fields’ reassignment drew fire from 
Congress. House Majority Leader Rich¬ 
ard Gephardt (D-Mo.) joined 10 Demo¬ 
cratic and Republican lawmakers in de¬ 
manding Fields’ reinstatement. 

“Firing and silencing such a key player 
in the US competitiveness debate is at 
best short-sighted, at worst a major 
breach of our future economic security,” 
the group stated in a letter of protest sent 
to the Pentagon. The group also called on 
House Armed Services Committee Chair 
Les Aspin to hold a hearing on the affair. 

About half of DARPA’s $1.2 billion 
budget goes to technologies with both ci¬ 
vilian and military applications, said one 
of the letter’s cosigners. Rep. Mel Levine 
(D-Calif.), at a news conference. He also 
said he is drafting legislation to give con¬ 
trol of DARPA to the Commerce Dept, 
and to require that most DARPA funds go 
toward dual-use technologies. 

A statement from the American Elec¬ 
tronics Association expressed “disap¬ 


year), benefits the Computer Museum, 
which is exclusively devoted to comput¬ 
ers and their impact on society. 

“The defeat of unknown nerds from 
failing East Coast companies was inevi¬ 
table,” said West Coast captain L. John 
Doerr, partner in the venture capital firm 
of Kleiner Perkins Caufield and Byers. 
His teammates were Stewart Alsop II, 
editor-publisher of PC Letter; William H. 
Gates, chair of Microsoft Corp.; Charles 
House, general manager of Hewlett- 
Packard’s Software Engineering Sys¬ 
tems Division; and Lawrence Tesler, vice 
president of advanced technology at 
Apple Computer. 

The East Coast team consisted of cap¬ 
tain Patrick J. McGovern, founder and 
chair of the International Data Group; 
William Foster, president and chief ex¬ 



Craig Fields 


pointment” in Fields’ leaving DARPA. 

“At times it seems that the administra¬ 
tion is preoccupied with the philosophi¬ 
cal content of its relationship with indus¬ 
try,” stated J. Richard Iverson, AEA 
president and chief executive officer, in 
decrying the lack of a national strategy 
for leading-edge industries. 


ecutive officer of Stratus Computer; 
Robert Frankston, chief scientist at Lotus 
Development; Edward Fredkin, profes¬ 
sor of physics at Boston University; and 
Russell Planitzer, chair of Prime Com¬ 
puter. 

The Public Broadcasting System tele¬ 
vision program, The Computer Chron¬ 
icles, beamed the bowl live via satellite to 
San Francisco and Santa Clara, Calif., 
Seattle, and Dallas. It also televised the 
bowl nationwide in late May. 

The third Computer Bowl will be co¬ 
hosted by the West Coast team April 26, 
1991. 

(Incidentally, the inspiration for Co¬ 
lossal Cave is in Tennessee, the Bloom 
County computer was a Banana Junior, 
and Futureworld was the first movie to 
use computer-aided animation.) 


in second Computer Bowl 
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National Science Foundation report assailed, defended 


In a strongly worded letter to William 
A. Wulf, assistant director for computer 
and information science and engineering 
at the National Science Foundation, 

IEEE Computer Society President Helen 
M. Wood protested “unsubstantiated, in¬ 
accurate, and damaging assertions” 
made about the society in a recent report 
issued by Wulf s office. 

“The report, Women in Computer Sci¬ 
ence, by Nancy Leveson, does make use¬ 
ful points and offer positive suggestions 
for improving what is assuredly a critical 
problem [the continued underutilization 
of women and minorities] in computer 
science and engineering...,” Wood stated 
in her letter. She commends the NSF for 
continuing to focus high-level attention 
on the problem but feels that the treat¬ 
ment of the Computer Society in the re¬ 
port was unwarranted. 

In comparing the role of women in the 
leadership of the ACM and of the IEEE 
Computer Society, the report says 

In contrast to the democratic structure of the 
ACM, which stresses elections, the IEEE 
Computer Society officers are, for the most 
part, selected by the previous officers, i.e., 
an “old boy’s” setup. 

Society president rebuts charge. “In 

fact, the Computer Society is governed 
by a board of 21 members, seven of whom 
are elected by the membership each year 
to three-year terms,” wrote Wood on be¬ 
half of the society’s Board of Governors. 
“The society membership each year also 
elects a president-elect and a first and 
second vice president. The current presi¬ 
dent, president-elect, and immediate 
past-president, as well as the first and sec¬ 
ond vice presidents, are also franchised 
members of the Board of Governors. The 
board appoints a nominations committee, 
but there is also a clearly defined and 
well-publicized petition process which 
provides a fairly easy path for individuals 
not nominated by the normal process to 


gain access to the elections. Although 
uncontested elections have occurred, 
they have been extremely rare. To be sure 
that the society does not fall into the trap 
of stagnant leadership, we have fixed lim¬ 
its on the number of years any individual 
can serve in any position, including...the 
Board of Governors. How this democrati¬ 
cally designed process can be character¬ 
ized as an ‘old boy’s setup’ defies under¬ 
standing. 

“We would be the first to agree that, 
like others, we have not achieved as much 
progress in recruiting women to leader¬ 
ship posts as we would like, but we have 
made more progress than Professor 
Leveson suggests. Including the present 
year, the president of the IEEE Computer 
Society has been female in three of the 
last seven years. (We’ve also had at least 
two Orientals, one black, and one His¬ 
panic serve as president in our recent his¬ 
tory.) The current first vice president is 
female. Another vice president who has 
responsibility for our periodical publica¬ 
tions (our largest VP portfolio) is female. 
And maybe most importantly, we have a 
substantial and growing cadre of female 
volunteer leaders...whom we fully ex¬ 
pect to rise through the ranks to board and 
officer positions over the next few years. 

“The IEEE Computer Society doesn’t 
have the answer to the problem of under¬ 
representation of women and minorities 
in computer science and engineering, but 
neither are we insensitive to the issue.... 
We would be pleased for Professor 
Leveson and any other professional in the 
field to join us in developing positive 
strategies to further improve the position 
of women in science and engineering in 
the future.” 

Author defends data. Responding to 
President Wood’s letter to Wulf, 

Leveson, an associate professor in infor¬ 
mation and computer science at the Uni¬ 
versity of California, Irvine, wrote, “I am 


sorry that you have found an objection to 
statements in my NSF report, but I stand 
by the numbers I published. 

“The percentage of women serving as 
editors-in-chief and on editorial boards 
for both IEEE and ACM was collected 
from publications put out by both socie¬ 
ties,” Leveson continued. “This is open 
information, and although editorial 
board membership is somewhat fluid and 
thus depends upon the time that it is 
checked, these were the numbers when I 
was collecting them last year. 

“Although the Board of Governors and 
other offices are elected in the IEEE 
[Computer Society], the technical com¬ 
mittee chairs are appointed, as are the ex¬ 
ecutive committees for the TCs (whereas 
they are elected in the ACM). This is the 
only explanation I can find for the very 
different percentages that exist between 
IEEE technical committees and ACM 
special interest groups. Your comment 
that women are well represented in the 
elected positions of the IEEE only serves 
to support my hypothesis. Unfortunately, 
many of the technical positions where re¬ 
searchers are likely to participate (the fo¬ 
cus of my report) are appointed.” 

Leveson, who is active in both organi¬ 
zations at the technical-committee and 
special-interest-group level, went on to 
say that the ACM seems better able to find 
qualified women for editor-in-chief and 
editorial-board positions. In addition, 
she said she knows of highly qualified 
female researchers who have been unable 
to break into the “inner circle” of IEEE 
TCs in some (though not all) technical 
areas. 

“I appreciate your personal sincerity 
and the efforts by many people involved 
with the Computer Society,” Leveson 
said, “and I agree that at the highest levels 
there are women involved. But at the 
lower, technical levels there are many 
fewer women than would be predicted by 
the numbers in the field and in other tech¬ 
nical groups.” 
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Initial meeting shows high interest in software reliability engineering 

Michael Lyu, Vice Chair, Technical Subcommittee on Software Reliability Engineering 


A keynote address, technical talks, a 
panel discussion, and organizational is¬ 
sues marked the initial meeting of the 
Subcommittee on Software Reliability 
Engineering April 12-13 in Washington, 
DC. More than 150 researchers and prac¬ 
titioners attended. NASA hosted the 
meeting in a symposium format, with a 
full two-day program and preassembled 
proceedings. 

Chartered by the IEEE Computer Soci¬ 
ety’s Technical Committee on Software 
Engineering (TCSE), the Subcommittee 
on Software Reliability Engineering was 
formed to focus on, coordinate, and fa¬ 
cilitate the exploration and application of 
current and emerging techniques in soft¬ 
ware reliability engineering (SRE). 

Needs, goals, techniques. In his open¬ 
ing remarks, Carl Schneider, director of 
NASA’s Reliability, Maintainability, 
and Quality Assurance Division, as¬ 
serted the “need for a quantitative or sta¬ 
tistical approach to software reliability.” 
He encouraged participants to identify a 
common discipline in SRE and to develop 
quantitative assessment techniques for 
software reliability improvement and 
software process management. 

James Paul, a professional staff mem¬ 
ber on the US House Science, Space, and 
Technology Committee, delivered the 
keynote speech, “Software Reliability: 
Why Capitol Hill Thinks It Is Important.” 
Paul said the regulatory agencies of the 
federal government need better methods 
to evaluate the quality and reliability of 
software in medical devices, aircraft, nu¬ 
clear power facilities, and a host of other 
safety-critical systems. He also dis¬ 
cussed recommendations made in a study 
by the House Subcommittee on Investi¬ 
gations and Oversight, especially those 
aimed at developing capabilities for im¬ 
proved reliability engineering in soft¬ 
ware. 

Software reliability researchers from 
industry and academia, including active 
practitioners from AT&T Bell Labs, 
Bellcore, Cray Research, Ford Aero¬ 
space, IBM, the Jet Propulsion Lab, and 
Hewlett-Packard, presented technical 
talks exploring both theoretical issues 
and practical experiences. Major topics 
included new software reliability mod¬ 
els, approaches, techniques, and issues 
and concerns of model usage and data col¬ 
lection efforts. A wide range of reports on 
SRE applications covered such topics as 


the space shuttle, the space station, tele¬ 
phone switching systems, supercomput¬ 
ers, medical imaging systems, and de¬ 
fense systems. 

Organizational issues. In a discussion 
of organizational issues, it was decided 
that the subcommittee should sponsor a 
symposium each spring and fall, publish 
a summer and winter newsletter, and con¬ 
duct committee meetings as needed. The 
subcommittee structure will include a 
steering committee consisting of a chair, 
vice, chairs for operation/administration, 
a treasurer, chairs for other committees, 
and a TCSE advisor. 

Additionally, there will be an educa¬ 
tion committee, a technical issues com¬ 
mittee, an application data committee, a 
next symposium committee, a subse¬ 
quent symposium committee, and a liai¬ 
son committee. The next symposium 
meeting, to be chaired by Anneliese von 
Mayrhauser of the Illinois Institute of 
Technology, is scheduled for spring 
1991. 


Thomas A. DeFanti and Maxine D. 
Brown of the University of Illinois and 
Bruce H. McCormick of Texas A&M 
University received a runner-up Com¬ 
puter Press Award for their article “Visu¬ 
alization: Expanding Scientific and En¬ 
gineering Research Opportunities,” pub¬ 
lished in the August 1989 issue of Com¬ 
puter (pp. 12-25). The category, best fea¬ 
ture in a computer publication, was one of 
the most competitive, with 188 entries. 

The article points out the qualitative 
change that occurs in information when 
visualization brings the eye-brain system 
into play. Color photographs provide ex¬ 
amples from molecular modeling, medi¬ 
cal imaging, mathematics, geosciences, 
astrophysics, and other scientific disci¬ 
plines. 

“What a challenge to help the reader 
visualize the concept of visualization,” 
said category judge Bill Brohaugh, edito¬ 
rial director for Writers Digest. “And 
what a solid success in tackling that chal¬ 
lenge in this article. Broad concepts and 
applications are clearly explained, inter- 


Panel session. The meeting concluded 
with a lively two-hour panel session on 
“Software Reliability Engineering — 
How Do We Get It Widely Used,” moder¬ 
ated by Michael Lyu of the Jet Propulsion 
Lab. Invited panelists were Walter Ellis, 
IBM; Amrit Goel, Syracuse University; 
Jean-Claude Laprie, LAAS, France; Bev 
Littlewood, City University, London; 
John Musa, AT&T Bell Labs; Martin 
Shooman, Polytechnic University; and 
Robert Troy, Verilog. Panelists and audi¬ 
ence members participated in a discus¬ 
sion of merits and weaknesses of SRE; 
software reliability models, measure¬ 
ments, and practices; technical and man¬ 
agement issues; techniques for reliable 
software; software safety and criticality; 
and ultra reliable software. 

For more information on the SRE sub¬ 
committee and its ongoing activities, 
contact A. Frank Ackerman, Steering 
Committee Chair, Institute For Zero De¬ 
fect Software, 85 Poplar Dr., Stirling, NJ 
07980, phone (201) 604-8701, fax (201) 
604-8702. 


estingly presented, and appropriately de¬ 
tailed. The story covers the hardware, the 
software and, best of all, the meaning of 
visualization and its role in helping us 
understand the world in general. Visual¬ 
ized well, written well, presented well.” 

The fifth annual Computer Press 
Awards, held April 17 in New York, was 
cosponsored by Citizen America Corp. 
and the Computer Press Association. 
More than 800 entries competed in 19 
categories, with general-interest publi¬ 
cations, computer magazines and news¬ 
papers, newsletters, books, and televi¬ 
sion programs all receiving recognition. 

The awards were made for works pub¬ 
lished in 1989. Entries for works written 
in 1990 will be solicited beginning in No¬ 
vember. 

The Computer Press Association, a 
nonprofit professional organization, de¬ 
termines the categories and rules, selects 
the judges, screens the entries, and deter¬ 
mines the winners. Citizen America, a 
California-based marketer of computers 
and printers, funds the awards effort. 


Press award honors authors of Computer article 
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NEW! SOFTWARE ENGINEERING TITLES 
from the 

IEEE COMPUTER SOCIETY PRESS 


Just Published! 

SYSTEM AND SOFTWARE REQUIREMENTS 
ENGINEERING 

by Richard H. Thayer and Merlin Dorfman 

CS Press Tutorial. 735 pages. March 1990. Hardbound. ISBN 
0-8186-8921-8. Catalog No. 1921. $88.00. Member $66.00. 

All under one cover, this tutorial assembles a significant body of 
knowledge on systems and software engineering. Emphasis is 
on software requirements analysis and specifications, and on sys¬ 
tem engineering and its interface with software engineering. In¬ 
formation is presented on subjects that are impacted by system 
and software requirements such as verification validation, man¬ 
agement, costs, and configuration management. 

The tutorial is intended for managers; system and software sys¬ 
tem engineers; software engineers, programmers, analysts, com¬ 
puter personnel; hardware engineers; and college-level 
students and professors. 


Papers were selected that either describe the technical * cen¬ 
ter" of a project or describe activities that interact with it (e.g., 
project management, configuration management, review and 
walk-throughs). 

Twenty unique refereed papers and reports, a glossary of more 
than 1000 terms applicable to requirements engineering, and a 
bibliography of books and papers with more than 50 entries are 
included. This volume discusses many new approaches to sys¬ 
tem and software engineering. Both original and reprinted pa¬ 
pers are included and organized into chapters according to 
their impact on system and software requirements. 

Sections: Software Requirements Analysis, Specifications, Meth¬ 
odologies and Representation Methods; Software Requirements 
Engineering Tools and Techniques; Testing, Verification, and Re¬ 
views of Software Requirements; Software System Engineering 
Process Models; Managing the Analysis Process. 


Just Published! 

STANDARDS, GUIDELINES, AND EXAMPLES ON 
SYSTEMS AND SOFTWARE REQUIREMENTS 
ENGINEERING 

by Merlin Dorfman and Richard H. Thayer 

CS Press Tutorial. 620 pages. March 1990. Hardbound. ISBN 
0-8186-8922-6. Catalog No. 1922. $72.00. Member $54.00. 

Standards and guidelines for system and software requirements 
engineering, both available and not-so-available, are exam¬ 
ined throughout this text. 

This tutorial is intended for managers, system and software sys¬ 
tem engineers; software engineers, programmers, analysts, and 
other computer personnel; hardware engineers; college-level 
students and professors. This volume provides a source of system 
and software requirements engineering references to aid in 
managing early lifecycle activities. The descriptions of guide¬ 
lines and standards for analyzing a system and/or software re¬ 
quirements specification will be useful to system and software 
system engineers. 


This volume contains general guidelines of procedures and tech¬ 
niques for analyzing and specifying system and software require¬ 
ments, standards for writing better requirements specifications, 
and a method for selecting the most cost-effective software re¬ 
quirements methodologies, representation methods, tools, and 
techniques. Descriptions of how system specifications are parti¬ 
tioned and allocated to software, the techniques and tools for 
analyzing and describing software specifications, and how soft¬ 
ware interfaces with hardware, will be helpful to hardware engi¬ 
neers. 

(It is intended as a companion volume to System and Software 
Requirements Engineering and includes a short example of a 
software requirements specification based on IEEE Std. 830-1984.) 
Sections: International Requirements Standards; US Military Stan¬ 
dards; Requirements Analysis Methodologies and Examples; 
Sources of System and Software Engineering Requirements. 


Just Published! 

PROCEEDINGS OF THE 12th INTERNATIONAL 
CONFERENCE ON SOFTWARE ENGINEERING 

356 pages. March 1990. Softbound. ISBN 0-8186-2026-9. 
Catalog No. 2026. $78.00. Member $39.00. 


This proceedings presents a well-balanced selection of re¬ 
search resulfs, experience and workshop reports, surveys, 
panels, and forward-looking plenary addresses that together 
focus on fundamentals that will be useful for the future of this 
technology. 

50 informative papers explore a number of key topics includ¬ 
ing, software re-engineering, safety-critical software, systems 
engineering, the types of models used to guide technology 
transfer, and reports on previously held workshops (including 
the one held just prior to ICSE-12). 


Sections: Process Models; Formal Verification: Real-Time and Re¬ 
active Systems: Metrics and Reliability; Software Re-Engineering; 
Tools for Formal Verification; Recent Advances in Object-Man¬ 
agement Systems; Prototyping; Design and Architecture; Reai- 
Ufe Safety-Critical Software; Ai Applications to Software 
Engineering; Technology Transfer; Systems Engineering; Configu¬ 
ration Management; and Experience Using Defined Processes 
for Technology Transfer. 


SEND ORDERS TO: 

IEEE COMPUTER SOCIETY PRESS 
10662 Los Vaqueros Circle, P0 Box 3014, Dept. 006-6 
Los Alamitos, CA 90720-1264 

or call toll-free 1-800-CS-BOOKS 
in California call 714/821-8380 or fax 714/821 -4010 











NEW PRODUCTS 


Contact or send press releases to Nancy Hays, Computer, 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1264; Compmailt, n.hays 


Amiga 3000 enhances multimedia line 


Commodore Business Machines says 
that its Amiga 3000 enhances the perform¬ 
ance capabilities of the company’s line of 
multimedia products. The PC includes the 
AmigaDOS 2.0 operating system. 

The Amiga 3000 features a 16- or 25- 
MHz Motorola 68030 processor, a 68881 
or 68882 math coprocessor, a 32-bit 
architecture, 2 Mbytes of memory stan¬ 
dard, and multitasking capability. The 
standard configuration comes with a 40- 
Mbyte hard disk drive and a 3.5-inch 
floppy disk drive, plus an SCSI interface. 
A 100-Mbyte configuration is available. 

Suggested retail prices are $3,299 
for the 16-MHz version, $3,999 for the 
25-MHz version, and $4,499 for the 
100-Mbyte configuration. Shipments 
are scheduled for July. 

Reader Service 30 


NCR has announced the System 10000 
Model 85, which extends that family of 
multiuser systems at the high end. Ac¬ 
cording to the company, Model 85 in¬ 
cludes multiple hyadic processors con¬ 
nected by the NCR-developed SCSI 
interprocessor bus. The new model also 
features sharable communications mod¬ 



Commodore’s Amiga 3000 provides 
multimedia and multitasking capa¬ 
bilities. 


ules and switchable SCSI peripherals. 

An operator controls systems re¬ 
sources for Model 85 through an ITX 
Windows PC console. 

Prices for Model 85 range from 
$485,000 to $600,000. 

Reader Service 31 


Voice processor 
compatible with DOS 

Voice Connexion offers the Micro In- 
trovoice modular speech-processing sys¬ 
tem with a built-in microprocessor. The 
voice I/O system provides voice recogni¬ 
tion of 1,000 words with an accuracy of 
98 percent, according to the company, 
and unlimited text-to-speech synthesis. 

With the software supplied, users can 
create vocabularies, edit, voice train, 
test, and maintain the system from a 
standard IBM XT, AT, 386, or compat¬ 
ible computer. The Micro Introvoice 
comes with system software and sample 
vocabularies, battery charger, serial 
cable, and documentation. 

The system includes two miniature 
boards. The motherboard contains an 8- 
MHz NEC V-25 CPU, 128 Kbytes of 
SRAM, up to 128 Kbytes of EPROM, an 
RS-232 interface, battery backup cir¬ 
cuitry, and address decoding logic. The 
daughter board contains recognition and 
synthesis electronics. Microphones avail¬ 
able include a two-way headset (recom¬ 
mended), a one-way headset with desk¬ 
top speaker, and wireless microphone 
options. 

The Micro Introvoice costs $995. The 
two-way headset is $129. Shipments be¬ 
gan in January. 

Reader Service 33 


NCR extends System 10000 family at high end 


Verbex claims continuous speech recognition 


Verbex Voice Systems claims that its 
Series 7000 Conversational Voice I/O 
System provides continuous speech rec¬ 
ognition. The system reportedly allows 
users to speak naturally at their normal 
speed, capture data, and perform transac¬ 
tions using a virtually unlimited vocabu- 

According to Verbex, the Series 7000 
has an active vocabulary of 2,100 words 
expandable to 10,000 words and a total 
vocabulary limited only by computer 
memory. Speakers train the system dur¬ 
ing a one-time training session. 

The Series 7000 incorporates Texas 
Instruments’ TMS 320C30 chip, a float¬ 
ing-point digital signal processor. The 


voice recognition software reportedly 
follows five times more grammar paths 
than earlier models. 

Job-specific Voiceware packages are 
available. These contain voice I/O appli¬ 
cation information, including grammars, 
vocabularies, training scripts, and docu¬ 
mentation. 

The Series 7000 comes as a stand¬ 
alone voice peripheral for connection to 
existing computer systems or as a Model 
6000 AT/EISA form factor plug-in 
board. Prices start at $9,600 for the 
stand-alone peripheral and $4,800 for the 
board. 

Reader Service 32 


Voice Connexion’s Micro Introvoice is 
a modular speech-processing system, 
shown here with a two-way headset. 
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Digital Equipment’s DECstation 5000 Model 200 becomes DEC’S most powerful 
RISC-based workstation. 


DEC expands RISC, 
DECsystem families 

Digital Equipment has announced new 
members in its RISC and DECsystem 
workstation families. 

The DECstation 5000 Model 200 desk¬ 
top workstation is Digital’s most powerful 
RISC-based workstation, according to the 
company. The new system reportedly in¬ 
corporates a new family of high-perform¬ 
ance graphics options, a new open Turbo¬ 
channel I/O interconnect, and advanced 
chip technology. The DECstation 5000 
Model 200 comes in four modular graph¬ 
ics configurations. Prices start at $14,995. 

The DECsystem 5800 series includes 
multiuser, Unix-based RISC systems em¬ 
ploying an R3000 CPU. The three-proces¬ 
sor DECsystem 5830 and four-processor 
DECsystem 5840 run Ultrix V4 with 
SMP. The DECsystem 5830 supports up 
to 192 Mbytes of memory, while the 
DECsystem 5840 supports up to 128 
Mbytes of memory. Entry prices for the 
5830 and 5840 start at $140,000 and 
$160,000, respectively. 

Digital has also announced three new 
multiuser systems based on SCO Unix 
System V/386 and Intel’s 80386 CPU. 

The new models are the DECsystem 
316+, DECsystem 325, and DECsystem 
333. The systems support entry-level con¬ 
figurations for four to eight users. They 
come with 4-8 Mbytes of memory, 80- 
170-Mbyte disk drives, backup tape, ter¬ 
minal multiplexer, and SCO Unix System 
V/386 (with an unlimited-user license). 


Stratus Computer has announced four 
mainframe-class systems, seven high-end 
systems, and two midrange systems in its 
XA2000 family of continuous processing 
systems. The new systems are compat¬ 
ible with previous-generation XA2000s. 

The midrange and high-end models 
support FTX (Stratus’ Unix System V- 
compatible operating system), VOS (the 
company’s proprietary operating sys¬ 
tem), and the Pick Open Architecture op¬ 
erating environment. The mainframe- 
class systems support VOS. 

The mainframe-class systems are the 
XA2000 Models 2260, 2460, 2660, and 
2860. The largest of the new systems, 
Model 2860, includes 48 duplexed proc¬ 
essors and up to 1 Gbyte of duplexed 
main memory, 249.6 Gbytes of duplexed 
disk storage, and 3,576 communications 
lines. Model 2660 includes 36 duplexed 
processors; Model 2460, 24; and Model 
2260, 12. 


The systems cost $9,690 for DECsystem 
316+, $14,455 for DECsystem 325, and 
$15,360 for DECsystem 333. 


The high-end models are 200-260. 
Models 210-260 replace the previous 
models, 110-160. Models range from the 
single-processor 210 to the six-processor 
260. The new models provide up to 128 
Mbytes of duplexed main memory, up to 
31.2 Gbytes of duplexed disk storage, 
and 440 communications lines. Model 
200 is an entry-level system, upgradable 
to Model 210 and beyond. 

The midrange models, 75 and 80, are 
hardware-based fault-tolerant systems 
with major components duplexed. The 
single-processor Model 75 can be up¬ 
graded to Model 80, with two duplexed 
processors. 

Prices start at $94,000 for the 
midrange Model 75 to $9,100,000 for the 
mainframe-class Model 2860. 

Mainframe: Reader Service 37 
High-end: Reader Service 38 
Midrange: Reader Service 39 


DECstation 5000: Reader Service 34 
DECsystem 5800: Reader Service 35 
DECsystem 3xx: Reader Service 36 


Parallaxis available as 
public domain software 

Thomas Braunl offers the data-parallel 
programming system Parallaxis as public 
domain software. According to Braunl, 
the Parallaxis model for structured pro¬ 
gramming of SIMD computers states that 
each parallel program contains, in addi¬ 
tion to the original procedural algorithm, 
a functional description of the requested 
processor interconnection topology. This 
reportedly results in parallel programs 
that are machine independent. 

The Parallaxis system consists of a 
compiler and a simulator. It is currently 
available for the Apollo DN3000, Sun-3, 
Sun-4, Sparcstation, IBM AT, and Apple 
Macintosh. 

For information, contact Thomas 
Braunl, Universitat Stuttgart, Fakultat In- 
formatik, Azenbergstr. 12, D-7000 
Stuttgart 1, West Germany. 

Reader Service 40 


Stratus beefs up XA2000 family with new models 
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Object-oriented Interactors targets embedded software development 


Objective: Systems offers a concurrent 
object-oriented environment for the C++ 
language. Called Interactors, the soft¬ 
ware targets embedded software develop¬ 
ment. According to the company, the 
real-time facilities are added to C++ 
compilers as a source library. 

Interactors requires an IBM PC or 
compatible running the Zortech C++ 


compiler 2.0. Features include multi¬ 
tasking based on object concurrency, 
preemptive scheduling with race con¬ 
trols, selective waiting, and interprocess 
communication. 

The software comes with examples 
and a 200-page manual. An initial site li¬ 
cense costs $4,000, with additional user 
and host computer licenses costing $500. 


The company also offers Zlocate, a 
ROM locator for the Zortech C++ com¬ 
piler 2.0. This tool converts MS-DOS 
loadable code into absolute code formats 
suitable for programming ROMs. Zlocate 
costs $300. 

Interactors: Reader Service 41 
Zlocate: Reader Service 42 


Repository-based CASE system benefits multiple team development 


LBMS, the US branch of London- 
based Learmonth & Burchett Manage¬ 
ment Systems, has introduced Systems 
Engineer, a repository-based, multiuser 
information systems development pack¬ 
age. The software operates on IBM PS/2 
and compatible 80386 machines with ex¬ 
tended memories. 

Features include simultaneous data 
sharing by multiple users and multiple 


CASE assists code reuse on 

Soft-set Technologies claims that its 
Aranda CASE product assists Macintosh 
developers in understanding, reusing, 
maintaining, and modifying existing 
source code. Aranda parses source code 
and automatically generates structure 
charts, class hierarchies for object-ori¬ 
ented code, identifier usage charts, and 
logic flow charts. 

The software reportedly provides a fa¬ 
cility to integrate project documentation 
in a single linked database including 
documents and diagrams from other 
CASE applications. Identifier names in 


Telesoft’s Teleuse system permits the 
interactive creation of OSF/Motif user 
interfaces. According to the company, 
Teleuse permits the design, implementa¬ 
tion, and evaluation of standard graphi¬ 
cal user interfaces independently of the 
application code. 

Teleuse consists of four components: 
the graphical layout editor, the D lan¬ 
guage, the user interface builder, and the 
runtime library. 

The Teleuse screen layout editor pro¬ 
vides a WYSIWYG approach to creating 
user interfaces. Developers work with in¬ 
terface objects such as buttons, scroll 
bars, pop-up menus, and dialogue boxes. 


project teams; support for Microsoft 
Windows and IBM’s OS/2 Presentation 
Manager; systems analysis and design 
techniques; and an architecture confor¬ 
mant to SAA, plus adherence to IBM’s 
AD/Cycle. 

Although able to operate as a stand¬ 
alone system. Systems Engineer report¬ 
edly targets team use through local area 
networks. An optimistic locking strategy 


any document are linked via hypertext to 
identifiers in the source code. 

Aranda is available for Apple’s MPW 
Pascal and Symantec’s Think Pascal. 
According to the company, future ver¬ 
sions will work with C, C++, Cobol, and 
Fortran. The software runs under Finder 
and Multifinder. 

Aranda requires a 68020- or 68030- 
based Apple Macintosh with a minimum 
of 2 Mbytes of memory. The Pascal ver¬ 
sion sells for $995. 

Reader Service 44 


Dialogue management functions in¬ 
clude a rule-based dialogue language 
called the D language. From the D lan¬ 
guage, developers can access a library of 
functions for changing the user interface 
at runtime. They can test and modify the 
interface interactively without regard for 
the associated application code. 

Teleuse is available for the DECsta- 
tion family, Data General Aviion, Sun-3, 
and Sun-4 workstations for $9,900, in¬ 
cluding a runtime license for Motif. No 
runtime license is required for programs 
built with Teleuse. 

Reader Service 45 


ensures resolution of access conflicts on 
line, according to the company. On-line 
rule checking traps errors and potential 
inconsistencies. 

Systems Engineer Level 1.0 costs 
$8,625. 

Training, installation support, and 
maintenance are priced separately. 

Reader Service 43 


Ingres provides 4GL 

Ingres targets the creation of software 
applications on workstations with its vis¬ 
ual programming tool and fourth-genera¬ 
tion development system, Ingres/Win¬ 
dows 4GL. According to the company, 
software developers working with the 
program use a graphical user interface to 
visually build and modify database appli¬ 
cations by selecting objects with a mouse 
and arranging them on the screen. 

Ingres/Windows 4GL consists of four 
modules: Ingres/4GL, the company’s 
fourth-generation language enhanced 
with object-oriented programming con¬ 
cepts; Frame Editor, which interactively 
paints windows; Menu Editor, which 
visually designs menu bars and pull¬ 
down menus; and Application Compo¬ 
nent Catalogues, which tracks applica¬ 
tion elements and coordinates multiple 
developers. 

Initial versions of the software, sched¬ 
uled for the third quarter, will run on Sun 
Sparc machines and Digital Equipment 
VAX/VMS systems. An optional prod¬ 
uct, Ingres/Windows 4GL is initially 
priced at 35 percent of the Ingres base 
product price, which varies according to 
the size and configuration of the user in¬ 
stallation. A typical configuration of two 
to eight workstation nodes yields a price 
of approximately $ 1,000 per node for 
Ingres/Windows 4GL. 

Reader Service 46 


System permits creation of user interfaces 
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Company, Model, Function 

Comments R.S. No. 

Analog Devices 
ADSP-21msp50 
Mixed-signal DSP 

Combines a digital signal processor with A/D and D/A converters. Code-compatible with the 120 

ADSP-2100 fixed-point DSP series. Includes program RAM, data RAM, computation functions, 
linear ADC and DAC, serial ports, and a parallel host interface port. Comes in a 132-pin plastic 
quad flat pack. Sampling in July. Cost (100,000s): starts under $30. 

Cirrus Logic 

CL-SH265 

Disk controller 

An XT/AT disk controller with a data rate up to 24 Mbps and buffer throughput up to 8 Mbps. A 121 

power-down reduces power consumption by 75%. Permits master/slave systems, different logi¬ 
cal spaces within buffer memory for caching, read look-ahead, and storage of microcode. Comes 
in a 100-pin quad flat pack and 84-pin PLCC. Now sampling. Cost (100s): under $22 (QFP). 

Cypress Semiconductor 
CY7C611 

RISC controller 

A Sparc RISC controller for embedded applications. Runs 18 MIPS at 25 MHz. Software-corn- 122 

patible with the CY7C601 Sparc microprocessor. Supports a direct interface to system memory. 
Communicates with external memory via a 24-bit address bus and a 32-bit instruction/data bus. 

Comes in a 160-pin plastic quad flat pack. Now sampling. Cost (1,000s): $76. 

Infochip Systems 

IC-105 

Coprocessor 

A lossless (noiseless) data compression/decompression coprocessor. Accepts continuous input 123 
data for compression of up to 2 Mbps and decompression of up to 5 Mbps. Includes an 8-bit micro¬ 
processor interface and programmable wait states. Supports clock rates up to 40 MHz. Comes in a 

68-pin PLCC or 80-pin plastic quad flat pack. Now sampling. Cost (1,000s): $80.95. 

Linear Technology 
LTC1051 

Op amp 

A dual auto-zeroed precision operational amplifier with on-chip sample-and-hold capacitors. 124 

Max offset voltage of 5 pV, max offset drift of 0.05 pV/degree C, typical voltage gain of 160 dB, 
slew rate of 4V/ps, and gain bandwidth product of 2.5 MHz with max supply current of 1 mA. 

Comes in an 8-pin plastic or ceramic DIP and a 16-pin SOJ. Cost: $4.25 (DIP). 

LSI Logic 

L64901 

Sparc processor 

A 32-bit Sparc integer unit with 15-MIPS performance at 25 MHz and 12.5 MIPS at 20 MHz. 125 

Features single-cycle instruction execution, seven overlapping register windows, fast interrupt 
response, and configurable address logic. Comes in a 160-pin plastic quad flat pack or 144-pin 
plastic PGA. Now sampling. Cost: $86 (20-MHz PQFP). 

Micro Devices 

MD1220 

Neural bit slice 

A neural bit slice that supports an eight-neuron-configuration neural network without external 126 

hardware. Controlled by a system of distributed numerical weights. Processing delay for a single 

NBS with eight synapses is 7.2 |i.s. Cost: $50; $395 for NBS Applications Kit. 

Micro Linear 

ML2261 

ADC 

An 8-bit A/D converter with sample-and-hold. Uses digital error correction. Max conversion 127 

time in track-and-hold mode of 755 ns. Pin-compatible replacement for ADC0820 and AD7820 

ADCs. Comes in 20-pin DIP or surface-mount PCC packages. Cost (100s): starts at $8.95. 

National Semiconductor 
FDDI family 

Chip set 

An FDDI LAN chip set with a full-duplex architecture. Consists of five chips: DP83261 basic 128 

media access controller, DP83251/55 physical-layer controller, DP83231 clock recovery de¬ 
vice, DP83241 clock distribution device, and the BMAC system interface (scheduled for the 
second half of 1990). Cost: $350 per set (samples). 

Sharp Electronics 
LH591x, LH592x 
DP-SRAMs 

Two series of dual-port static RAMs. LH591x is organized 2Kx8 (available now) and LH592x, 129 

4Kx8 (sampling and production in the third quarter of 1990). Both series feature access times of 

35, 45, and 55 ns. They come in 48-pin DIP and 52-pin PLCC packages. Cost (100s): $16 and up. 

Sharp Electronics 
LH5492 

FIFO 

A clocked FIFO organized 4Kx9 with a look-ahead access architecture. Allows data to be syn- 130 

chronized with both the read and write clocks. Provides independent control over output buffers 
with a separate output enable signal. Comes in 25-, 35-, and 50-ns cycle time versions with 20-, 

25-, and 35-ns access times, respectively. Cost (100s): $64.81 for LH5492U-25 in 32-pin PLCC. 

Teledyne Solid State 

C76 Series 

I/O modules 

Industrial solid-state computer I/O modules. Feature optical isolation and VDE spacing and volt- 131 
age. Output modules allow TTL- or CMOS-level signals to control switching of AC loads up to 

1 A/240 Vims and DC loads up to 0.6A/60Vdc. Come in 16-pin TO-116 DIPs. Cost: $6.50-$7.60. 

White Technology 
WF-1024K8-150 

Flash PROM 

An 8-Mbit flash PROM memory module. Measures 1.93x1.14x0.185. Consists of eight 1-Mbit 132 

CMOS flash memories on a thick-film substrate. Organized lMbytex8. Has an access time of 

150 ns. Requires both 5Vdc and 12Vdc supply voltages for operation. Comes in a 34-pin hermeti¬ 
cally sealed metal package. Cost: starts at $826. 
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Microsystem Announcements 


Company, Model, Function Comments R.S. No. 


Analogic A 16-bit data acquisition plug-in board for IBM AT and compatible computers. Features 16 

LSDAS-16 single-ended and eight differential analog inputs, an autocalibrating 16-bit ADC, a 50-kHz ADC 

Data acquisition board sample rate, two deglitched DACs, a digital I/O port, and timers. Cost: $1,395. 


Arcom Control Systems 

SC68000 

STEbus processor 


PC-32C, PC-32M 
DSP coprocessors 


AST Research 
AST Dual SX/16 
PC 


C&C Technology 
CC-T1M, CC-T4M 
TRAM modules 


Dolphin Scientific 

DSP450 

DSP computer 

DTK Computer 
Model 386SX 
Laptop 


Kontron Electronics 
IP Lite 
Portable PC 


Newer Technology 
SCSI Dart 
RAM disk system 


PEP Modular Computers 
VDout-32 

Digital output interface 

Real Time Devices 
ADA2000 

Data acquisition board 

Swan Technologies 
Swan 486/25 
PC 


Visionary Systems 
APx series 
Parallel processors 


An STEbus processor board that interfaces a 16-bit 68000 with local memory to the STEbus. 
Maintains a word-organized architecture across the card. Comes with an 8- or 16-MHz CPU. Has 
256 Kbytes of zero-wait-state static RAM on-board. Two sockets house up to 512 Kbytes of 
EPROM. Cost: £285 (8 MHz, with 64-Kbyte RAM). 

Floating-point digital signal coprocessors for IBM PCs and compatibles. Based on AT&T’s 50- 
MHz DSP32C. Provide 32-bit floating-point and 24-bit fixed-point data formats. Come with 
6,144 bytes of internal memory and 64 Kbytes of zero-wait-state memory. PC-32M supports 
more memory than PC-32C. Cost: $1,995 for PC-32C; $2,795 for PC-32M. 

A dual-compatible PC for US and Japanese markets. Software-compatible with PCs operating 
under the NEC 9801 standard and ISA-standard PCs running DOS. Based on Intel’s 16-MHz 
80386SX CPU. Has eight ISA-compatible enhancement-board slots. Comes with 1 Mbyte of 
RAM upgradable to 4 Mbytes; supports up to 16 Mbytes total system memory. Prices not given. 

Single-width, transputer-based TRAM modules. Consist of a T800 transputer processor, either 
1 Mbyte of 70-ns DRAM (CC-T1M) or 4 Mbytes of 80-ns DRAM (CC-T4M), and four 20-Mbit 
message-passing serial links. Available with a 17.5-, 20-, or 25-MHz processor. Cost: $1,295 for 
CC-T1M (20 MHz); $2,695 for CC-T4M (20 MHz). 

A desktop signal-processing computer with 18 DSP32C floating-point digital signal processors, 
each connected to its own 1 Gbps bus. Configurations with up to 40 analog channels, four 32-bit 
digital ports, four serial ports, and four RS-232 ports. Macintosh II or IBM ATs. Cost: $59,990. 

A laptop computer based on Intel’s 80386SX CPU. Features a 640x480 VGA LCD display, de¬ 
tachable keyboard, battery pack, 1 Mbyte of RAM (expandable to 5 Mbytes), one RS-232 serial 
port, one parallel printer port, two phone jacks, a 3.5-inch floppy disk drive, and software. Cost: 
under $3,000; $3,500 with modem and 40-Mbyte hard drive. 

A 22-pound modular portable industrial computer based on Intel’s 80386SX, 80386, or 80486 
CPU. Has seven total EISA slots and two RS-232-C serial ports, one switchable to RS-422/485. 
Runs DOS, OS/2, or Unix, according to specification. Has a 16-shades-of-gray VGA display and 
removable keyboard. Cost: starts at $8,395 for 386SX (1-Mbyte RAM, 40-Mbyte hard drive). 

A memory system with 16 slots that accept snap-in 256-Kbyte and 1-, 4-, or 16-Mbyte SIMMs. In 
an external chassis, includes a UPS battery backup. Accepts a second bank for a total 32 SIMM 
sockets. Works with IBM and Apple Macintosh PCs. Connects via cable to SCSI interfaces. 
Comes with 2 Mbytes of RAM basic and battery backup. Cost: $3,395; $7,000 with 24 Mbytes. 

A 32-channel optoisolated digital output interface for the VMEbus. Has 32 optoisolated 24Vdc 
digital outputs in four groups of eight lines. Provides read-back of all output lines or eight extra 
digital inputs. Draws below 4W. 3U size. Cost (100s): $540. 

A 12-bit data acquisition and control board for the IBM XT and AT bus. Has 12-bit A/D and D/A 
conversion, 5-MHz counting, and digital I/O functions. Supports eight channels of differential or 
16 channels of single-ended analog input. Routes all signals through one PC slot. Cost: $589. 

A PC based on Intel’s 80486 CPU. Comes standard with 4 Mbytes of 80-ns RAM, shadow RAM, 
8-Kbyte internal cache, five half-height device bays, 1.2-Mbyte 5.25-inch and 1.44-Mbyte 3.5- 
inch floppy disk drives, eight 16-bit expansion slots, software, and varied hard-drive configura¬ 
tions. Cost: $6,599 with 180-Mbyte hard drive, VGA color monitor, and Super VGA card. 

A series of massively parallel computer plug-in boards with 16-bit-wide processing elements 
and interconnection buses. Fit in IBM ATs. Memory from 512 Kbytes to 4 Mbytes. Two configu¬ 
rations: the two-board AP64 and the three-board API 28. Porting software requires changing 
computation-intensive subroutines only. Cost: $12,995 for AP64; $19,995 for AP128. 
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PRODUCT REVIEWS 


Editor: Richard Eckhouse, UMASS-Boston, Harbor Campus, Boston, MA 02125, Compmail+, r.eckhouse; Bitnet, eckhouse@umbsky; CompuServe, 70516,556 


We present two themes and a follow-up review in this issue. The first has to do with display adapters and monitors for the PC. 
The second covers scanners, both hand-held and full-page, plus OCR software to convert the printed word into a machine-read¬ 
able form. Contributors to these two themes include Noah Davids, Sorel Reisman, and myself. Ruth Maulucci did a follow-up re¬ 
view of Asyst, first covered in the March 1989 issue of Computer (pp. 81-84). In the earlier review, we raved about and criticized 
the product. A recent revision includes significant changes and thus warranted its inclusion as another full-length evaluation by a 
different reviewer. — Richard Eckhouse 

Displaying at its best 


To really notice a significant improve¬ 
ment in the perceived quality of your 
computing, nothing beats a high-resolu¬ 
tion, high-quality monitor and display 
adapter. As the buying public has moved 
toward VGA systems, monitor and dis¬ 
play adapter vendors have started to offer 
significantly enhanced products that go 
considerably beyond yesterday’s stan¬ 
dards. We will tell you about a couple of 
excellent buys indicative of the values 
now available. 

Eclipse VGA 

Without question, the Eclipse VGA 
from Prism Imaging Systems is one of 
the best display adapters we have seen. 
Even if you already own a VGA card, the 
Eclipse VGA gives you at least four rea¬ 
sons to consider buying another one. 
First, this product offers super VGA 
resolutions generally not found on other 
boards. Second, it works equally well in 
these higher resolutions with either a 
VGA or EGA monitor. Third, it offers 
complete register compatibility with all 
previous graphics standards, including 
MDA, HGC, CGA, EGA, and VGA. And 
fourth, the builders are so assured of the 
quality of their product, they offer it with 
a lifetime guarantee. 

If that doesn’t capture your interest, 
you’d better go back and reread the para¬ 
graph. This super display adapter is un¬ 
like any other. It comes standard with 11 
high-resolution modes: 320x200x256, 
512x512x256, 640x400x256, 
640x480x256, 720x540x256, 
800x600x16, 800x600x256, 
1,024x768x2, 1,024x768x4, 
1,024x768x16, and 1,280x640x16 col¬ 
ors. The card automatically detects the 
host bus configuration (8- or 16-bit), 
senses the monitor type, and selects the 
software display mode. Given all these 
resolutions, it comes as no surprise that 
132-column text mode is also available. 

We tested the board inside a 20-MHz 


PC Genius 386 machine using both the 
Hitachi 19-inch multifrequency VGA 
monitor and the Tatung 14-inch multifre¬ 
quency EGA monitor. Actually, we tried 
the EGA test out of curiosity, and it 
worked so well, it both impressed and 
pleased us. After all, the video band¬ 
width of the EGA monitor supposedly 
does not allow it to operate at 
1,024x768, but that’s one of the features 
of this board that makes it so unusual: it 
works with all standard TTL and VGA 
color monitors in addition to multifre¬ 
quency ones. 

The well-made, half-length Eclipse 
VGA board includes a full 512 Kbytes of 
socketed RAM. It has the usual set of dip 
switches for a board of this type, allow¬ 
ing you to configure the power-up moni¬ 
tor mode. A full set of utilities lets you 
test and select the different video modes. 
Included is a shadow RAM utility for 
moving the board’s BIOS out of ROM 
and into faster RAM memory. 

What surprised us was the large num¬ 
ber of high-resolution drivers included. 
They allow you to use this board with 
AutoCAD, Autoshade, Lotus 1-2-3, 
Symphony, GEM, Microsoft Windows, 


Ventura Publisher, Pagemaker, WordPer¬ 
fect, and Wordstar. Of course, you don’t 
need these if you operate in normal VGA 
modes, but with the capabilities of the 
Eclipse board, why not expand? 

We tested all the higher resolution 
drivers (800x600, 1,024x768, and 
1,280x640) for Windows/386. Each in¬ 
stalled as expected using the Windows 
setup program, then ran flawlessly. On a 
19-inch monitor the results were superb. 
If ever anyone had a reason to own such 
a monitor, this board would be it. The re¬ 
sults look nearly equivalent to what we 
customarily see on a Sun color worksta¬ 
tion, but at a fraction of the cost. 

This product features a well-done 
user’s manual. Detailed and complete, it 
also provides additional, very useful in¬ 
formation. The first chapter explains 
compatibility, while the second and third 
cover installing the board inside your 
computer and daily operation. Chapter 
four tackles troubleshooting, should you 
ever need it, while chapter five presents 
the multitude of software utilities and 
drivers. Chapter six and the appendix 
serve those who need technical informa¬ 
tion, including how to wire-up a 9-to-15- 
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Hitachi America’s Accuvue 20-AS 


pin conversion cable. 

We started off by calling the Eclipse 
VGA an outstanding product. Thus, it 
should come as no surprise that we rec¬ 
ommend it highly. It is available from 
Prism Imaging Systems, 5309 Randall 
Place, Fremont, CA 94538, phone (415) 
490-9360, fax (415) 490-9342 for a very 
reasonable $599 list price. The company 
also offers a full line of VGA boards, 
from the $199 Prism VGA to the $299- 
$349 Elan/Elan+ series to the $349-$499 
Elite/Elite+ series. Each provides a sub¬ 
set of the features and memory found on 
the top-of-the-line Eclipse board covered 
in this review. 
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Hitachi HM-4319-D 

Many workstation users are quite fa¬ 
miliar with 19-inch displays because you 
need one if you use X Windows or the 
like. A 14-inch or smaller display would 
be nearly unreadable. But, for most of us 
PC users, a color 19-inch monitor re¬ 
mains more of a dream that a reality. The 
catch is price. Typically, such a monitor 
retails for around $3,500. Discount 
houses offer them for about $ 1,200 less 
(or about a third off), but for the same 
amount of money you could buy a 20- 
MHz PC. So, on a price-only basis it’s 
hard to justify. 

On the other hand, when your work 
involves computer-aided design, a 19- 
inch monitor becomes necessary. Like¬ 
wise, going blind with AutoCAD or 
Windows running in 800x600 (or 
greater) display mode makes it easy to 
justify a larger monitor. When you start 
taking advantage of the 1,024x768 and 
higher resolutions, no solution short of a 


19-inch monitor is worth considering. 

We first saw the Hitachi HM-43 series 
demonstrated at a computer show having 
a high ambient light level. Immediately 
impressed with the brightness, sharpness, 
and high contrast of the Hitachi displays, 
we requested one for review. The unit we 
received is part of a series that includes 
17-, 19-, and 20-inch displays, in both 
auto scan and professional models. From 
the wide range of models you can select 
horizontal frequency ranges from 47-78, 
30-65, 72-78, 67-72, 60-65, and 47-52 
kHz, with a vertical frequency range 
from 55-80 Hz in noninterlaced mode. 

Front panel controls include power on/ 
off, momentary degaussing, brightness, 
and contrast. Inside a side panel lies the 
horizontal frequency scan indicator, di¬ 
vided into three columns indicating a fre¬ 
quency range of 30-36, 47-52, and 60-65 
kHz. For each range you can adjust the 
horizontal and vertical centering, width, 
and height using a special plastic tool 
supplied with the unit. The Z-version 
(the one we tested) has four VGA mode 
indicator lights; the one lit depends on 
the horizontal and vertical synch polar¬ 
ity. Most of us don’t need this indicator 
because we really don’t care about the 
mode, but we found it interesting to see 
what was going on. Incidentally, the 
Hitachi performed the best of the moni¬ 
tors we’ve used when switching between 
horizontal frequency ranges. It switched 
ranges so quietly and quickly that we had 
to look at the indicators to even know it 
had occurred. 

We tested the Z-version (30-65 kHz) 
with the Prism Imaging Systems Eclipse 
VGA in all the typical resolutions, in¬ 
cluding 640x480, 800x600, 1,024x768, 
and 1,280x680. The only available reso¬ 
lution we did not test was 1,280x1,024. 
Each of our tests produced excellent re¬ 
sults — good color, no noticeable distor¬ 


tion, and sufficient latitude in the bright¬ 
ness and contrast controls to adjust for a 
wide range of lighting conditions. As 
you might guess, Windows/386 in these 
higher resolutions was as readable as in 
lower resolutions on a 12- or 14-inch 
monitor. The difference is that you can’t 
easily put the 19-inch monitor on top of 
your computer, since it weighs about 57 
pounds and measures 19 inches wide by 
18 inches high by 20 inches deep. Even 
on a desk, you will give up a consider¬ 
able amount of space. 

Dot pitch is 0.31 mm. Each signal in¬ 
put line includes a 75-ohm termination 
switch, which might be necessary for 
improperly terminated connector cables 
(it wasn’t in our case). A tilt and swivel 
base is part of the cabinetry. 

About the only disappointing aspect of 
this outstanding monitor is that Hitachi 
does not sell a cable to go with VGA dis¬ 
play adapters. The monitor uses five 
BNC connectors for red, green, blue, 
horizontal, and vertical synch inputs. 
Thus, you probably can’t easily find an 
off-the-shelf cable, and you might, as we 
did, have to make one specially or con¬ 
tract for one to be made. 

Clearly, the price of the unit — $3,440 
— is a serious consideration. But, based 
on our assessment, we can easily recom¬ 
mend this top-of-the-line 19-inch moni¬ 
tor. 

The monitor is available through 
Hitachi America, 6 Pearl Court, Allen¬ 
dale, NJ 07401, phone (201) 825-8000, 
fax (201) 825-4761. 
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Toucan-VGA 

To really take advantage of a CRT’s 
quality requires an adapter board that 
matches the specs of the display. The 
automode-switching Toucan-VGA 1024 
display adapter achieves that objective in 
most cases. With 256 Kbytes of RAM 
(expandable to 512 Kbytes), the board 
supports the following resolutions: 
1,024x768 with 16 colors; 800x600, 
640x480, and 320x200 with 256 colors; 
640x350 and 640x200 with 16 colors; 
and 720x348 in monochrome. It costs 
$199. 

Installing this 16-bit board is very 
straightforward. You simply set the ap¬ 
propriate dip switches to match the board 
to your display. In addition to Seiko, 
Sony, and NEC multifrequency monitors, 
the board supports IBM PS/2 analog 
monitors and variable-frequency displays 
up to 70 MHz. 

You can easily install supporting soft¬ 
ware by copying a few files to a hard 
drive subdirectory. Modification of 
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config.sys with two new drivers speeds 
up video BIOS operations (fastbios.sys) 
and provides the necessary support for 
the extended screen modes (eansi.sys, an 
ansi.sys-compatible device driver). 

Some of the other software utilities al¬ 
low you to type command lines that 
change display modes among the follow¬ 
ing: 132x25, 132x28, 132x44,40x25, 
80x60, 80x25, and 100x40 character/row 
resolutions. You can also change the 
scan line resolution for the 40/80 charac¬ 
ter/line options to “the most pleasing text 
resolution.” 

We used the command line syntax to 
generate all of these options except for 
the 132 character/line modes. The selec¬ 
tion of any of these caused the display to 
“blip” and the screen to go black. In an 
attempt to determine the problem, we 


called and left a number of messages 
with the company, but no one ever re¬ 
turned our calls. 

Another pair of intriguing utilities de¬ 
scribed in the manual. Hotkey and 
Hotzoom, allegedly work with graphics 
programs, allowing you to zoom and pan 
around a bitmapped image. We could not 
make these work, either. 

Despite the problems with these little- 
used or not-too-useful command line 
driven programs, the board is useful be¬ 
cause of the software drivers shipped 
with it. The included drivers provide ex¬ 
tended video support for Lotus 1-2-3, 
Symphony, Autodesk, AutoCAD (V 
2.62, 9.0, and 10.0), Ventura (V 1.1), 
GEM, and Windows. 

We reinstalled Windows, selecting the 
driver supplied with the Toucan board. 


The results were absolutely delightful. 
The driver worked just fine in the 
132x44 character mode, and we were 
able to read the microscopic (10-point) 
text that Word For Windows displays in 
Print Preview mode. While you could not 
work with this text on a 14-inch monitor, 
you can on a 19-inch monitor. A larger 
monitor benefits anyone needing a high- 
resolution graphics display for serious 
design or desktop publishing applica¬ 
tions. 

For more information on this board or 
its availability, you can reach Communi¬ 
cations Inter-Globe at 633 McCaffrey, 
St-Laurent, Quebec, Canada H4T 1N3, 
phone (514) 738-6580 or (800) 635- 
0318. 
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Intelligent character recognition 


Over the years we’ve followed the 
progress in optical character recognition. 
In fact, we can remember working with a 
manufacturer of mainframes that adver¬ 
tised its capability in OCR by using an 
OCR-A typeball as the printing element 
for much of its business correspondence 
(although best suited for the movie indus¬ 
try, in our opinion). This typeface carried 
a stylistic connotation to portray just the 
right element of the mystical about com¬ 
puting to suitably impress the public. 

Over the years, optical character recog¬ 
nition has improved considerably. First, 
OCR-B provided a much more readable 
typeface. Second, more sophisticated 
OCR software was no longer limited to 
recognizing only specific typefaces in 
fixed sizes. Third, you no longer needed 
the power of a large mainframe along 
with an expensive scanner to read a 
typical document. Today, relatively inex¬ 
pensive microcomputers and hand-held 
scanners can reduce this considerable 
task to an easily managed one. With the 
right equipment, you can translate the 
printed page to an ASCII file in less than 
a minute. 

Mitsubishi scanner 

With the advent of word processors 
and databases that can include graphic 
images comes a requirement for a new 
input device. We need a device that can 
scan an image on paper and transfer it to 
a file on the PC. The Mitsubishi hand¬ 
held scanner is one of a number of low- 
cost (under $1,000) devices now on the 
market. 


System description. The Mitsubishi 
scanner consists of four pieces: the ac¬ 
tual scanner, an optional page feeder, a 
control board, and the controlling soft- 

The scanner (SP-MH216AF) measures 
10.3 inches long by 1.7 inches wide by 
2.2 inches high. It weighs 21 ounces. The 
maximum scanning width is 8.5 inches. 
The scanner’s resolution is 8 dots per 
millimeter, which is approximately 200 
dots per inch. The drop-out color is red. 
Its five-foot cord plugs into the back of 
the control board or the page feeder from 
which it gets its 300 mA of power. 

The page feeder (SP-MH01FA) meas¬ 
ures a compact 12.5 inches long by 3.8 
inches high by 4.2 inches wide. A paper 
tray extends 7.8 inches. The feeder ac¬ 
cepts paper up to 120 micrometers 
(which, a paper salesman told us, is a 
heavy weight paper — almost, but not 
quite, a light cardboard). A five-foot 
cord plugs into the control board. The 
page feeder, which requires 400 mA of 
power, has an on/off switch controlling 
the power to both the page feeder and the 
scanner when connected to the page 
feeder instead of the control board. It 
also has a plug for an external 12-volt 
power supply, but a transformer is not 
included as part of the package. 

The full-length 8-bit control board is 
designed for an XT or AT bus. It draws 
600 mA of power. 

The software, version 1.1 and dated 
March 16, 1989, comes on one 5.25-inch 
360-Kbyte floppy disk. The documenta¬ 
tion says that your system must have 640 
Kbytes of memory and 1.2 Mbytes of 
free disk space to run the software. 


Installation. Installation of the control 
board was not simple. The manual care¬ 
fully identifies the five sets of jumpers 
on the board, but does not fully explain 
what each does or how you should set 
them. The default settings disabled our 
mouse. Even after we changed the set¬ 
tings and got our mouse working, we still 
couldn’t get the software to read the data 
being sent by the scanner. We ended by 
calling technical support, which supplied 
a workable set of jumper settings. 

We found the position of the jumpers 
— at the bottom of the board — less than 
optimal. To change the settings, we had 
to remove the board to get at the jump¬ 
ers. What made things worse was that the 
full-length control board barely fit into 
our small-footprint AT. We don’t know 
if the fault lies with a slighter longer 
than spec board or a slightly smaller than 
spec case on the PC, but it’s something 
to consider. 

You install the scanner simply by 
plugging it into the control board or the 
page feeder. The page feeder plugs into 
the control board. The paper tray and the 
scanner both snap into place. The only 
problem is what to do with the extra four 
feet of scanner cable when the scanner is 
connected to the page feeder. 

You install the software by copying it 
from the distribution disk. It’s made up 
of 22 help files, eight font files, a “read 
me” file, and the scanner program. It re¬ 
quires a total disk space of 320,594 
bytes. In addition, running the scanner 
program creates two other data files, 
which use another 1,524 bytes. During 
the scanning process, a temporary file 
(approximately 1 Mbyte) is also created. 
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Mitsubishi Electronics’ page scanner module 


If you are short of disk space, you can 
delete the “read me” and help files to 
save 25,600 bytes. 

Documentation. The quarter-inch 
thick manual is bound like a paperback 
book, so it won’t stay open on a given 
page unless you break the binding or use 
a paperweight. For some reason it’s also 
punched for a PC-standard (half-sized) 
three-ring binder. With the exception of 
the control board jumpers, the manual 
contains an excellent description of how 
to unpack and install the system. It also 
contains a good description of how to 
use the software. However, it comes with 
a nine-page corrections pamphlet dated 
July 1988. You would think that after 
almost two years they would have re¬ 
printed the manual. Most of the correc¬ 
tions are minor typos (spelling, capitali¬ 
zation, abbreviations), but a few cover 
significant corrections to the program 
description. The manual has no index, 
which made finding information more 
difficult than it should have been. 

The “read me” file contained one-line 
descriptions of changes between the vari¬ 
ous releases, a list of switches used to 
invoke the software, and a list of “deep” 
error codes and their meanings (which 
you should never get, because they are 
useless to you; programs normally ask 
you to report such errors to the com¬ 
pany). The manual also included a set of 
error codes and their meanings. This is 
always nice to have, but we should point 
out that we never encountered any errors, 
so we never had to refer to this portion of 
the manual. 

Usage. In general, the user interface to 
the software is very good. You see what 
looks like a simple black-and-white paint 
program, so anyone familiar with this 


type of program should get along with 
this package without much problem. A 
mouse, although not necessary, makes it 
much easier to use the program. Unfortu¬ 
nately, the mouse cursor jumps around a 
lot. When you move the mouse cursor 
from one menu option to another, the 
new option is immediately inverted and 
the mouse cursor jumps to the left mar¬ 
gin of the option, even if it started out 
near the right or center of the option. 

This jumping around took some getting 
used to. You select an option by clicking 
the left mouse button. 

The on-line help provides short 16-line 
descriptions of all the functions, but they 
are organized and labeled differently 
than the menu options. These short de¬ 
scriptions are meant as a quick reference, 
not as a replacement for the manual. 

The program records in one of its data 
files the last directory and file name that 
an image was written to or loaded from 
and displays that directory and file name 
as the default for the next write or load 
operation. In its other data file it records 
the last set of startup switches used, and 
will use those switches as the defaults 
during the next startup. If for some rea¬ 
son you want to change any of the 
startup switches, the program does dis¬ 
play at startup time instructions for en¬ 
tering a menu subsystem to select a new 
set of switches. 

Scanning is simple. After you select 
Scan from the menu bar, a dialog box 
lets you pick the type of document to 
scan. The choices are 

• image type: text/line graphics or 
halftone 

• image polarity: black on white or 
white on black 

• page width: 3, 5, 7, or 8.5 inches 

• page height: 5, 8.5. 11, or 14 inches 


The radio buttons used to select these 
choices could be bigger and easier to 
read. An unselected option is indicated 
by two small concentric circles. A se¬ 
lected option has the inner circle filled 

Once you have set all the options, you 
select the scan button. A percent-scanned 
gauge is displayed with the indicator sit¬ 
ting on 0% until you press the scan but¬ 
ton on the scanner. At that point scan¬ 
ning starts, and the gauge moves toward 
100%. This takes a few seconds. Once 
scanning is completed, a new gauge ap¬ 
pears while a picture is constructed. This 
gauge also moves toward 100% very 
quickly. After a few seconds a com¬ 
pressed image of the page is displayed. 
From pressing the scan button on the 
page feeder to the image being displayed 
took our PC (a 12-Mhz 286 with 640 
Kbytes of memory and 340 Kbytes of 
disk cache) 18 seconds for an 8.5x11 
sheet of paper. The speed was the same 
for both text/line graphics and halftone. 

Besides the ability to save the scanned 
image in either PCX, TIFF, or ATX for¬ 
mats, the program provides a limited set 
of editing features. Perhaps the most im¬ 
portant is the ability to cut a portion of 
the image and save just that portion to a 
file. You can also convert images from 
one format to another by reading an im¬ 
age in format 1 and saving it in format 2. 
You can flip an image horizontally, ver¬ 
tically, or through any angle, or invert it. 
Since the original display shows a com¬ 
pressed, full-page image, you can also 
zoom in and see more detail. You can 
then use scroll bars to move around the 
zoomed image, but you really need a 
mouse. The keyboard controls worked 
fine on everything else, but not on the 
scroll bars (this is documented in the cor¬ 
rections pamphlet). 

The PCX and TIFF files that we pro¬ 
duced were accepted by our other soft¬ 
ware packages without problems. In fact, 
the only problem we found with the soft¬ 
ware was that once you have started an 
image rotation, you cannot stop it — and 
image rotations can take a very long 
time. The percent-completed gauge 
moved so slowly that at first we thought 
our PC had hung. 

Using the scanner/page feeder is ex¬ 
ceptionally simple. You just insert the 
paper and press the scan button on the 
page feeder. The power light on the page 
feeder (normally red) turns green when a 
piece of paper has been inserted far 
enough for the rollers to grab it. The scan 
button will not work until you have in¬ 
serted a piece of paper. To stop a scan in 
progress, just press the scan button 
again. 

Hand scanning, although also simple 
in principle, takes some practice. The 
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unit is light enough so that holding it 
presents no problem. You must press and 
hold down the scan button on the scanner 
while scanning progresses. Releasing the 
button indicates that scanning is done. 

The hard part is learning how fast to 
move the scanner across the page. If you 
move the scanner too fast or too slow, 
the image comes out stretched or com¬ 
pressed. 

The scanner also has a contrast dial, 
used in both the hand-held and page- 
feeder modes. Careful adjustment of this 
dial makes a big difference in the quality 
of the scanned image. 

We have only two complaints about 
using the hardware. First, the contrast 
adjustment dial needs to be bigger and 
have some kind of scale on it. Second, it 
should be possible to use the scanner in 
its page-feeder and hand-held modes 
without having to replug everything. 
When plugged into the page feeder, the 
scanner cannot be used in its hand-held 
mode. The manual states that your PC 
should be powered off before plugging 
or unplugging either the scanner or page 
feeder into or from the control board. 
They mean it. The pins on the plugs are 
very close together, and it’s easy to cre¬ 
ate a short. We discovered this the hard 
way. Luckily, neither the scanner nor our 
PC was permanently damaged. 

We scanned pages using both the 
page-feeder and hand-held modes. We 
tried color and black-and-white text/line 
graphics and halftone images on news¬ 
print and glossy (coated) magazine pa¬ 
per. The quality of the color images var¬ 
ied considerably depending on the colors 
in the image. The quality of the halftone 
images also varied considerably and was 
very sensitive to the contrast setting on 
the scanner. The black-and-white text/ 
line graphics always scanned well. The 
type of paper made no difference. 

Support. Neither the manual nor the 
on-line documentation included informa¬ 
tion on how to get telephone support. We 
finally phoned the 800 number on the 
warranty card to locate the nearest serv¬ 
ice center, which gave us the number for 
technical support (213/217-5732). The 
tech support people seemed competent. 
They provided the help that got us going 
and answered all our questions about de¬ 
tails. The phone line was sometimes 
busy, but we didn’t have a real problem 
getting through. We were put on hold 
very briefly a few times, and once we got 
a recording telling us to leave a message 
and they would get back to us. It took 
about six hours, but they did call back. 

Summing up. The software and hard¬ 
ware worked flawlessly. We would have 
designed a few things differently, but we 


have no significant complaints. We 
found that, with careful adjustment of the 
contrast dial, we could produce accept¬ 
able images from black-and-white origi¬ 
nals; images from color originals were 
more of a problem. The PCX and TIFF 
files produced worked with other graph¬ 
ics programs without problems. Tele¬ 
phone support was also very good, al¬ 
though hard to track down and not a toll- 
free number. 

Contact Mitsubishi Electronics at 991 
Knox St., Torrance, CA 90502, phone 
(213) 515-3993. 
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Catchword and Prodigy 
OCR 

Catchword and Prodigy OCR (POCR) 
are OCR software products that operate 
with hand-held scanners. Catchword is 
available from Logitech, 6505 Kaiser 
Dr., Fremont, CA 94555, phone (415) 
795-8500, for a list price of $199. Prod¬ 
igy OCR comes from KYE International, 
12675 Colony St., Chino, CA 91710, 
phone (714) 590-3940, with a retail price 
of $49. 

The products. We tested Catchword 
with the Scanman Plus, a 4-inch hand¬ 
held scanner with adjustable resolutions 


The real measure of 
OCR software 
lies in how well it 
recognizes character 
patterns. 


ranging from 100 dpi to 400 dpi. These 
resolutions allow the device to recognize 
text whose size ranges between 6 points 
and 20 points. We tested POCR with the 
Geniscan-4500, almost identical to the 
Scanman Plus. Except for the color and 
shape of the plastic, both devices have 
identical controls for adjusting scan 
shade, dither density, and resolution. In 
fact, the Geniscan-4500 even worked 
with Logitech’s interface board and 
Catchword software. Unfortunately, nei¬ 
ther the Logitech scanner nor the inter¬ 
face board worked with POCR (or with 
the other accompanying software from 
KYE International). 

Both packages installed easily. Upon 


execution, Catchword presents a three- 
option pull-down menu: one option for 
file operations, one to enable the scan¬ 
ning modes, and one to fine-tune the al¬ 
most completely automatic OCR process. 
POCR presents a more conventional 
menu, allowing you to use function keys 
to select from up to 10 options. In fact, 
the list is so extensive that our expecta¬ 
tions of the package exceeded those for 
Catchword. Options include pitch selec¬ 
tion, sensitivity to touching characters, 
input to text or TIFF files, and font selec¬ 
tion. POCR also has options to “learn” 
text patterns as new, savable fonts. All 
these options are reminiscent of features 
found in more expensive packages. 

System requirements for both are an 
IBM-compatible PC, 640 Kbytes of 
memory, and a graphics display adapter 
(such as an HGC, CGA, EGA, or VGA 
board). Catchword requires DOS 3.0 or 
higher, a hard disk drive, and either a 
3.5- or a 5.25-inch floppy disk drive. 
Prodigy OCR requires DOS 2.0 or higher 
and a 5.25-inch floppy disk drive. 

In general, both packages allow you to 
scan vertical strips of text or a full page 
(vertically, in two passes). Catchword 
also accepts a right-to-left horizontal 
scan. Catchword recognizes the white 
space that delineates vertical strips (such 
as newspaper columns); the software 
automatically assembles two passes over 
a full page if the passes have at least 
one-fourth inch of overlap between them. 
Prodigy is a lot less flexible. For ex¬ 
ample, to scan an 8-inch page, you first 
scan the left side, and the text appears in 
a left window on the screen. Then you 
scan the right side, and the text appears 
in a right window. Using cursor control 
keys, you must move the windows until 
appropriate lines overlap to merge the 
output of the two respective scans. 

Evaluating the products. The real 
measure of OCR software lies in how 
well it recognizes character patterps. In 
tackling this, POCR encountered disas¬ 
ter. Unlike Catchword, POCR performed 
terribly. It did so badly that we recom¬ 
mend you completely avoid it — even 
though we actually preferred the hand 
scanner as well as the accompanying 
software utilities. 

POCR accepts scanner input only for a 
fixed period of rime. While it scans, the 
characters appear on the screen. When it 
times out — and it does that very quickly 
— it gives you statistics regarding its 
performance during the scan. Or, more 
correctly, it gives you statistics that re¬ 
flect its perception of its performance. 
For example, in one scan in which it in¬ 
correctly recognized almost all charac¬ 
ters, POCR reported a 100 percent recog¬ 
nition rate. 
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On the other hand, Catchword operates 
quite acceptably. It compares scanned 
letters, or objects, to ASCII models 
(models are provided for English, 

French, German, Italian, and Spanish 
character sets)i You must preselect the 
language before scanning. The input/rec¬ 
ognition process is 

(1) scan the text; 

(2) examine the displayed bitmap us¬ 
ing cursor control keys and deter¬ 
mine if the scan is satisfactory; 

(3) enter Recognition mode where, af¬ 
ter preprocessing takes place, you 
see one character at a time and can 
interactively correct each unrecog¬ 
nized symbol; 

(4) during or after this phase, provide 
Catchword with redefined models 
to correct any totally incorrect 
model/objects it might have used; 

(5) enter the ASCII text editor to 
make final corrections; and 

(6) save the file or convert it to the 
format of your favorite word pro¬ 
cessor (Word, Multimate, Word- 
perfect, and Xywrite). 

While these last two steps are not really 
necessary (you can probably go directly 
to your own word processor), we found 
them handy. 

We tested Catchword using the nicely 
laid out example in the easy-to-follow tu¬ 
torial (for reference, POCR’s documenta¬ 
tion is much more sketchy). We also 
tested Catchword with proportionally 
spaced columnar text from both newspa¬ 
pers and magazines and with some non- 
proportionally spaced laser printer out¬ 
put. While we didn’t get the 99 percent 
accuracy rate cited on the package, the 
first pass for all the trials yielded a con¬ 
sistent error rate — about 15 percent. 

OCR software can actually make two 
kinds of errors. One kind constitutes a 
clear error, substituting the wrong char¬ 
acters for misrecognized letters. The sec¬ 
ond kind involves the inability to recog¬ 
nize a character and consequently flag¬ 
ging those characters as unrecognized. 

Catchword made few errors of the for¬ 
mer type. As for the latter type, when it 
didn’t recognize a character, it seemed 
consistent in not recognizing the same 
character throughout the scanned docu¬ 
ment. Of course, following the recogni¬ 
tion stage, you can easily correct such er¬ 
rors because you are presented with each 
unrecognized character, one character at 
a time, for interactive corrections. Unfor¬ 
tunately, if you want to scan more mate¬ 
rial from the same source document, you 
cannot tell Catchword to remember the 
models that you taught it during the pre¬ 
vious scan/recognition. Another short¬ 
coming is that it doesn’t have an option 


for bitmapped images for use by a paint 
package. You must use Logitech’s Paint- 
show Plus software, which comes 
bundled with the scanner. 

In experimenting with Catchword, we 
discovered (as the manual states) that the 
quality of the scan can affect the error 
rate. Because both Geniscan and Scan- 
man Plus have resolution and shading 
adjustments, we could vary the quality of 
the scanned image using these adjust¬ 
ments and also by varying the scanning 
speed. Nevertheless, even a “sloppy” 
scanning job resulted in definitely ac¬ 
ceptable results for both scanners using 
Catchword. 

Catchword has some bugs. When we 
tried to scan a page consisting of a few 
words (10 pitch in a nonproportional 
font) with many horizontal solid lines 
such as you might find on a form, the 
software crashed and returned to DOS. 
Fortunately, because Catchword doesn’t 
keep in memory very much already-cre¬ 
ated data, we didn’t lose much. Another 
drawback to the program is that it 
doesn’t use expanded/extended memory. 
Consequently, even though we had 6 
Mbytes of RAM on our AST 386 com¬ 
puter, we consistently ran out of memory 
when we tried to read a complete 
8.5x11-inch page. We had to resort to 
two scans per half page. Unfortunately, 
the learning that Catchword did for the 
first half of the page had to be repeated 
for the two overlapping scans in the sec¬ 
ond half. 

Recommendations. Despite these mi¬ 
nor annoyances and our initial skepti¬ 
cism about the value of hand-held scan¬ 
ners for serious applications, the combi- 


It’s really too bad 
that Catchword doesn’t 
have the features 
that Prodigy OCR 
tries to incorporate 
and that POCR 
doesn’t recognize 
characters as well as 
Catchword does. 


nation of Scanman Plus and Catchword 
changed our minds. It’s really too bad 
that Catchword doesn’t have the features 
that POCR tries to incorporate. Learning 
mode, pitch/font selection, and TIFF and 


ASCII formats would allow a Catchword 
user to really fine-tune the performance 
of a scanning task. On the other hand, 
it’s too bad that POCR doesn’t recognize 
characters as well as Catchword does. If 
it did, it would be a super product. 

Considering that the total street price 
of the Logitech scanner and software is 
probably well under $300, we can rec¬ 
ommend this system for low-production 
OCR tasks. 

Catchword: Reader Service 25 
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Readright 2.01 

Our first sampling of OCR Systems’ 
Readright came at Comdex 89. A Chinon 
DS-3000 coupled with this software pro¬ 
duced fast and accurate results from 
nearly anything fed in. Pages from manu¬ 
als, vendor literature, and even the 
Comdex program itself were scanned 
rapidly with impressive results. Of 
course, we remained somewhat skeptical 
and felt the need to test this dynamic duo 
under more typical conditions. 

What are typical conditions? In one 
case it meant taking a paper document 
filed many years ago and now in need of 
revision. In another, it was a LaTeX file 
previously sent to a Postscript printer 
and now needed for posting on an elec¬ 
tronic bulletin board. A third example 
was a fax message slated to become the 
source for a paper in preparation. In each 
case, we wanted to avoid retyping the 
original. 

How did it work out? Very well. In the 
first case, in a matter of minutes we 
scanned in a three-page, monospaced 
document originally prepared on a type¬ 
writer, passed it through Readright, and 
stored it as an ASCII file. We then im¬ 
ported it to a word processor and spell 
checked it to remove the few misreads 
that had occurred. 

In the second case, the document was 
a proportionally spaced, 8-point typeface 
that came off a heavily used and very 
tired Postscript printer. Despite the light, 
low-contrast output, Readright read the 
document quickly and accurately after a 
couple of trial runs to select the proper 
settings (we set Readright for “very 
light” and proportionally spaced text). 
Because the LaTeX file had been trans¬ 
mitted by e-mail, transmission errors out¬ 
numbered OCR errors by a margin of 
two to one. In other words, the OCR 
software performed better than our e- 
mail transmission had. 

In the third case, we were not so 
lucky. Readright will only handle fine¬ 
mode fax images, and we had received 
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our fax in the lower, standard resolution. 
We hadn’t read the manual before we 
started using the OCR software, so it re¬ 
ally wasn’t the package’s fault. To make 
a fair test, we simulated a transmission 
(we scanned the image in high-resolution 
mode to the fax package and used it as 
the source for Readright). The simulated 
transmission contained 2,912 characters. 
Of those, Readright marked seven as un¬ 
identified. Five were in error but not 
marked, and one letter was capitalized by 
mistake. This translates to a recognition 
rate of 99.55 percent — a very impres¬ 
sive result. 

Of course, we made other tests of 
Readright, as well. To begin with, the 
manual is largely tutorial. Once you’ve 
gone through and read what this software 
does and how to install it on your ma¬ 
chine, you are ready to scan and recog¬ 
nize the six sample documents included 
with the manual. What better way to 
learn than by doing? You quickly find 
that the software can read in both land¬ 
scape and portrait modes, that it can 
handle single or multiple pages, and 
that it can even handle different type¬ 
faces set in multiple columns on the 
same page. 

But what really impressed us was the 
large number of output formats (16 of 
them in version 2.01), so you can transfer 
the converted file to your word processor 
without losing text and layout attributes. 
This means that bold and underlined text 
in either serif or sans serif typefaces with 
point sizes ranging from 6 to 16 points 
(for your typical 300 dpi scanners) can 
be read and saved while preserving line 
spacing, indents, tab settings, justifica¬ 
tion, columnar attributes, and paragraph 
breaks when transferring the results to 
your applications (such as a word pro¬ 
cessor, database system, or simple text 
editor). 

The product. Readright’s minimal 
system requirements, plus support for a 
large number of input devices and an 
impressive list of features, make it ex¬ 
tremely versatile. Readright requires 
only a compatible PC with 640 Kbytes, a 
hard disk drive, a floppy disk drive, and 
DOS 3.0 or higher. You will find ex¬ 
panded or extended memory a definite 
plus, easily obtained either by using the 
drivers included with the package or run¬ 
ning it under Windows (286 or 386). The 
software directly supports 13 different 
page scanners (including Abaton, AST, 
Canon, Chinon, The Complete PC, HP, 
Kyocera, Microtek, Panasonic, Pentax, 
Ricoh, Taxan, and Umas), as well as 
hand-held scanners from Logitech and 
The Complete PC. As mentioned before, 
it can also read 200x200 dpi fax images 
in TIFF or PCX format. 


Readright’s minimal 
system requirements, 
plus support for 
a large number of 
input devices 
and an impressive list 
of features, make it 
extremely versatile. 


Source documents can be typeset, 
typewritten, laser, or dot matrix (both 
draft and near letter quality) output, in 
both monospace and proportionally 
spaced form. The program recognizes 
bold and underlined characters and pre¬ 
serves formatting in ASCII (paragraph, 
CR/LF, tabular, mail list, or WYSIWYG 
formats), dBase, DCA-RFT, Lotus, MS 
Word, WordPerfect, Wordstar, Microsoft 
RTF, Xywrite, and Excel text file for¬ 
mats. You can scan documents in both 
portrait and landscape modes with a skew 
tolerance up to 0.25-inch per line. Recog¬ 
nition speeds depend on processor speed, 
but reputedly go as high as 500 words per 
minute on a 20-Mhz 386 machine. 

Using Readright. Installation is auto¬ 
matic using the Install program. Four 
5.25-inch or two 3.5-inch disks contain 
the recognition software as well as the 
scanner support. We specified the Chinon 
DS-3000 scanner and drive D. The soft¬ 
ware when loaded took a little more than 
1.7 Mbytes of hard disk space. 

After you start Readright, the opening 
screen reminds you to connect and turn 
on the scanner (if you use one — you can 
certainly read documents supplied in a 
compatible image TIFF or PCX format). 
The main menu lets you specify the paper 
length, print contrast, scanner resolution, 
use of an auto-feeding scanner, text file 
format, single- or multiple-page docu¬ 
ments, symbol to substitute for a 
misidentified character, and whether to 
retain the column and text attributes of 
the source document. 

Three other menus, called the frame, 
utilities, and setup menus, offer addi¬ 
tional capabilities. In the frame menu you 
can prescan the document and then create 
a frame around the part of it that interests 
you. This is particularly useful when the 
document includes material not recogniz¬ 
able as text, such as pictures, or not of 
interest, such as headlines or parts of 
other stories in a multicolumn document. 


The utilities menu lets you perform DOS 
functions, such as display a directory of 
your files, delete or rename a file, and 
run an application. This menu also lets 
you work with image files (to scan, dis¬ 
play, and read them) or save and recall 
settings set from the main menu (such as 
page length, print contrast, frame size, or 
image storage format). The setup menu 
allows you to specify some of these set¬ 
tings, including the saved image format, 
the units, the display adapter type (gener¬ 
ally set to auto-detect), and colored 
menus. 

Having set things up, the only time 
you will change them is when the image 
you want to scan changes in size, type 
style, or contrast. You get to experiment 
to find out what works best. After Read- 
right begins scanning and translating 
your document, a scanning bar appears 
near the top of the main menu. The bar 
provides a graphic indication of the por¬ 
tion of your document already scanned, 
brightening to show how much of the 
document has been converted into an im¬ 
age. Another, even bolder, bar then 
shows how much has been converted into 
text. The generated text appears below 
the graphic display so that you can abort 
the process, select new settings, and im¬ 
mediately decide if the results are better 
or worse. 

Recommendation. To quote from the 
manual that comes with this software: 
“...Readright uses a topological tech¬ 
nique to recognize each character by its 
shape.... Many other recognition prod¬ 
ucts use a matrix-matching technique... 
[for which] you must laboriously ‘train’ 
the system to recognize each character. 
Matrix-matching systems tend to be 
slow, difficult to use, and easily con¬ 
fused by similarly-shaped characters.” 

We concur, having used both types of 
systems. 

We found Readright to be everything 
claimed and more. It saves valuable time 
often wasted in retyping hard-copy docu¬ 
ments. It is so intuitively easy to use that 
you might never read the excellent man¬ 
ual that comes with it. Finally, Readright 
makes it feasible to store information in 
hard-copy form and painlessly convert it 
back into machine-readable form for fu¬ 
ture reference. All of this from a product 
that retails for $495 and does not require 
any prior training! We rate Readright 
highly recommended and definitely a 
best buy. 

Readright is available from OCR Sys¬ 
tems, 1800 Byberry Rd., Suite 1405, 
Huntington Valley, PA 19006, phone 
(215) 938-7460 or (800) 233-4627, fax 
(215) 938-7465. 
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An assist to Asyst 

The Asyst software package is a PC- 
based system for data acquisition, analy¬ 
sis, and graphics. Although a number of 
commercial offerings do subsets of what 
Asyst does, and some even do certain 
things better, no system is as comprehen¬ 
sive as Asyst. In particular, it supports 
virtually all signal processing, statistical, 
and graphics routines typically required 
by scientists and engineers, even for ad¬ 
vanced applications. It is extremely con¬ 
venient to find all of these in one inte¬ 
grated package. 

Until now, you had to pay a price in 
Asyst for this level of sophistication and 
versatility. Namely, to secure the desired 
results, you had to design programs us¬ 
ing the very high level language pro¬ 
vided, and the constructs of this language 
are cumbersome. The language uses 
postfix notation, which is logically cor¬ 
rect but nonintuitive, and it’s rather un¬ 
forgiving. For example, DIM[ 100 ] will 
work but DIM [100] and DIM [ 100 ] will 
result in an error message. 

Asyst Version 3.0 (see the March 1989 
issue, pp. 81-84, for our first look at 
Asyst) circumvents these difficulties. 

This version contains Easy Coder, which 
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facilitates application programming; a 
DOS shell for performing DOS opera¬ 
tions without exiting Asyst; language in¬ 
terfaces to Microsoft C and Fortran to 
enable the running of external programs 
from within Asyst and to support mixed 
language programming; and menuing, so 
program authors can convert their exist¬ 
ing programs to menu-driven programs. 

The jewel by far in the set of additions 
is Easy Coder. With this capability, you 
merely fill in menu prompts; the system 
automatically generates the code, in the 
Asyst programming language, to produce 
the desired results. It is nearly impos¬ 
sible to exaggerate the importance of this 
feature. It frees you from dealing with 
the syntax of the language in several 
ways. I’ll mention some representative 
advantages. 

First, the overlays that contain certain 
functions are called automatically, so 
you don’t have to remember where 
things are located. Second, the only typ¬ 
ing involved in file creation, reading, and 
appending is the file name; the system 
supplies the usual file maintenance com¬ 
mands. Third, vuports (segments of the 
screen where graphics will be displayed) 
are greatly simplified, since they can be 
designated by drawing and moving boxes 
by means of the arrow keys and respond¬ 
ing to a clear menu of parameters related 
to axes and plot attributes. A nice acces¬ 
sory to this is that as you insert or 
change parameter values in the menu, the 
graph produced by the current set of pa¬ 
rameter values is displayed and updated 
on the screen to the right of the menu. 
Fourth, statements to control acquisition 
tasks and to set up GPIB devices are 
similarly generated automatically follow¬ 
ing a menu entry session. Fifth, all array 
operations, such as cross sections and 
catenating, and all waveform operations, 
such as Fourier transforms and correla¬ 
tions, require little more from you than 
supplying array names. Finally, a utilities 
feature (callable at any point in the sub¬ 
menus) takes care of the code for creat¬ 
ing and initializing variables, including 
those pesky dimension statements. 

Easy Coder is organized into main 
menus and layers of submenus. A menu 
field initially contains a default value, 
when there is a value that has a greater 
expectancy of being selected than any 
other. Where appropriate, a submenu has 
a Generate Code option. When you select 
this option, the code is inserted at the 
end of the program under development. 

A help menu describes each menu choice 
in detail, and mouse support is built in, 
making the menus even more accessible. 

Easy Coder is a true applications gen¬ 


erator that at the very least produces a 
robust first pass at the desired code, with 
comments, which you can then modify 
through conventional programming. The 
Asyst people are to be congratulated for 
providing an impressive solution to a 
common complaint about their system. 

On a less technical but not inconse¬ 
quential note, the documentation has also 
been improved. The manuals are smaller 
in size, lighter, and spiral-bound, making 
them easier to manipulate for consulta¬ 
tion. Three new manuals have been 
added to the six-manual collection. There 
is an “up and running” manual for the 
Easy Coder analogous to the original “up 
and running” manual, a manual describ¬ 
ing the new feature enhancements, and a 
handy dictionary of Asyst words. 

I personally have used a previous ver¬ 
sion of Asyst for two very complicated 
biomedical applications requiring atypi¬ 
cal signal processing and statistical pro¬ 
cedures, and flexible graphics. I have 
screamed and cursed over the unwieldy 
mental acrobatics that it demands, but 
each time I temporarily defected to an¬ 
other system, I soon came scurrying back 
to Asyst because, even with its quirky 
demands, it was the only one that could 
do everything I needed. With Version 3.0 
and the Asyst Easy Coder, you can now 
have it all — a full-bodied acquisition 
and analysis system and a simple way of 
harnessing its power. 

All these new features require lots of 
memory. In the past, I was able to have 
my LAN software coresident with Asyst; 
not any longer. You need a full 640 
Kbytes and, while a pair of floppy disk 
drives should work, you really need a 
hard disk drive. A graphics adapter is re¬ 
quired (CGA, EGA, Hercules, or VGA) 
along with a math coprocessor. Asyst is 
designed to work automatically with ex¬ 
panded memory. It requires a parallel 
port for the copy protection block. 

List prices for Asyst start at $1,695 for 
the basic modules and go to $2,295 for 
the full set of four modules. A new 
multiuser pricing scheme gets that price 
down to as low as $895 per user for the 
whole system. Also, there is a very rea¬ 
sonable set of upgrade prices for users of 
earlier versions (from $195 to $595). 

Asyst is still not for the faint of heart. 

It is for serious researchers who are not 
afraid to get their hands dirty to get what 
they want. But I kind of like that about 
it. Contact Asyst Software Technologies, 
100 Corporate Woods, Rochester, NY, 
14623, phone (716) 272-0070, fax (716) 
272-0073. 
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Transferring technology is tough 


Ware Myers, Contributing Editor 

“Software development is far more dif¬ 
ficult than most people believe,” asserted 
Dines Bjomer of the Dansk Datamatik 
Center and Larry Druffel of the Software 
Engineering Institute in a position paper 
for a panel discussion on industrial expe¬ 
rience with formal methods during the 
12th International Conference on Soft¬ 
ware Engineering in Nice, France, March 
26-30. Formal methods were developed 
by researchers to reduce the level of diffi¬ 
culty. 

The panelists, all of whom were se¬ 
lected for their experience with these 
methods, agreed that formal methods did, 
indeed, help. Formal methods uncovered 
problems not found with other methods, 
said Deborah Cooper of Unisys. They 
helped to maintain specifications, sim¬ 
plify design, and generate test cases. 

The bad news is that they are not well 
accepted by the software community. 
They proved very difficult to transfer into 
our divisions, reported Derek Coleman of 
Hewlett-Packard in England. 

And that exemplifies one of the very 
big problems of the software engineering 
community: technology transfer. “There 
is a growing awareness that while many 
innovative program-design methodolo¬ 
gies have been developed in recent years, 
few of these innovations are being widely 
used within actual software development 
environments,” James D. Babcock, 

Laszlo A. Belady, and Nancy C. Gore of 
the MCC Software Technology Program 
wrote in a conference paper. “The suc¬ 
cess of the discipline of software engi¬ 
neering itself will depend at least as much 
on discovering and implementing effec¬ 
tive technology-transfer strategies as on 
creating new design technology.” 

The problem is recognized in Europe as 
well. “On the one hand, our technology is 
inadequate to meet the demands, in par¬ 
ticular, the quality demands of future sys¬ 
tems,” noted David Talbot, head of Di¬ 
rectorate General XIII of the Commis¬ 
sion of the European Communities, in an 
invited lecture. Talbot is responsible for 
administering Esprit research in informa¬ 
tion technology and telecommunica¬ 
tions. “On the other hand, our rate of tech¬ 
nology deployment is insufficient to sat¬ 



in the next few years, Esprit hopes to 
move software engineering from an 
ad hoc craft to an industrial process 
and practice, said David Talbot of the 
Commission of the European Commu¬ 
nities. He intends to put more emphasis 
on the promotion of results from the 
research, getting these results into 
practice. 


isfy the demand of current systems.” 

Talbot is trying to establish a better bal¬ 
ance between the clear technology push 
that characterized Esprit activities in the 
early part of the 1980s and the now 
equally clear need to create an increased 
level of technology pull by prospective 
users. 

Traditional transfer mechanisms 
not enough. “When organizations seek 
to acquire new technology, they often se¬ 
lect transfer mechanisms based on a sim¬ 
plistic view of what it takes to move tech¬ 
nology into routine use,” Priscilla J. 
Fowler of the SEI told the panel on Expe¬ 
rience Using Defined Processes for Tech¬ 
nology Transfer. This panel had partici¬ 
pants from five countries — the problem 
is widely recognized. 

It is evident that reading articles, pa¬ 
pers, and technical reports, and even at¬ 
tending short training courses, is not 
enough. Fowler cited the work of 
Dorothy Leonard-Barton on a process of 


mutual adaptation — “...the reinvention 
of the technology and the simultaneous 
adaptation of the organization.” She also 
mentioned Louis G. Tomatzky and his 
associates, who described a life cycle of 
technology adoption independent of the 
life cycle for developing the technology 
itself. A third researcher, Bela Gold, 
wrote on the need for successive adapta¬ 
tion adjustments to the organization 
adopting a new and complex technology, 
Fowler added. 

“So it is clear that an overly simple ap¬ 
proach is likely to meet with only the most 
limited success,” Fowler indicated. 

Technology must be received. “The 
phrase, technology transfer, implies 
movement of technology from one place 
to another,” Fowler continued. “It also 
implies that the technology is actually 
used in everyday work once it is moved.” 

To implement the mutual-adaptation 
process that the researchers she cited 
have identified, there must be collabora¬ 
tion between those who know the tech¬ 
nology and those who know the context 
into which the technology will go. Fowler 
calls the latter “technology receptors.” 
Others have termed the people on this 
interface “boundary spanners.” 

“This technology receptor group — 
together with managers and engineers 
from software development organiza¬ 
tions — tracks, screens, installs, and 
evaluates new methods and technology 
for improved software engineering prac¬ 
tice within an organization,” Fowler ex¬ 
plained. “It also works to orchestrate 
process improvement activities such as 
process maturity assessment, process da¬ 
tabases, and process education and train¬ 
ing.” 

The SEI itself may be thought of as a 
very large receptor group for the US De¬ 
partment of Defense. “Among the more 
robust and interesting examples [of re¬ 
ceptor groups] the SEI has found are 
groups at the IBM Systems Integration 
Division (one of the pioneers of the idea), 
Hewlett-Packard, Westinghouse Elec¬ 
tronic Systems Group, Contel Federal 
Systems, and AT&T Bell Labs,” Fowler 
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The panel chair, Kurt Fischer of the 
Contel Technology Center, pointed out 
that the transfer process “in a way [is] 
analogous to the software development 
process, with identified phases, organ¬ 
izational responsibilities, intermediate 
products, quality standards, and quanti¬ 
tative measurements. Once defined, the 
plan must then be implemented and con¬ 
trolled. As with other business processes, 
technology transfer can be periodically 
reviewed to assess progress and to rectify 
deficiencies.” 

Getting management on board. It is a 

truth universally acknowledged — from 
Adam Smith’s time to our own — that, 
when members of a profession gather to¬ 
gether, some one in the back row will an¬ 
grily blame management for the ills cur¬ 
rently besetting them. True enough, there 
are conflicting pressures playing on soft¬ 
ware management — schedule, funds, 
personnel, customer changes, etc. — and 
sometimes these pressures do prevent the 
use of better technology. Certainly man¬ 
agement has to play a central role in the 
technology-transfer process. 

To introduce the use of formal, or 
Fagan, inspections at Caltech’s Jet Pro¬ 
pulsion Laboratory, for example, mem¬ 
bers of the Software Product Assurance 
Section “spoke personally to every major 
JPL project manager and many technical 
line managers about [their] value,” Mari¬ 
lyn Bush noted in one of ICSE’s innova¬ 
tions, an experience report. “This en¬ 
tailed close to 200 formal and informal 
discussions.” 

Then, the section organized a confer¬ 
ence at which representatives of outside 
institutions recounted their experience 
with formal inspections. One of the 
speakers was Michael Fagan himself, 
who developed a seven-step process in 
the 1970s designed to find, fix, and docu¬ 
ment defects in technical documents and 
source code. At the conference, JPL’s 
deputy director discovered some inter¬ 
ests in common with Fagan, so he at¬ 
tended the next managers’ class, a two- 
hour overview of the 12-hour course for 
developers. Of course, word of his inter¬ 
est spread and other senior managers at¬ 
tended the class as well. 

In the next 21 months, 300 inspections 
found an average of four major and 12 
minor defects in 38 pages of documenta¬ 
tion or code. Ten projects have adopted 
the technique and five more have made 
plans to do so. “We hope to find and fix 
close to 75 percent of all defects before a 
project enters the test phase,” Bush said. 

In another experience report, Tom 
DeMarco of the Atlantic Systems Guild 
and Curt Geertgens of Aldus told of a 
management problem we should all be so 
lucky as to have. The original designers 



“We have explicitly scheduled experi¬ 
ence reports at this conference,” Peter 
Freeman of George Mason University 
and the University of California Ir¬ 
vine, program co-chair, pointed out in 
a post-conference interview. “The at¬ 
tendance at these sessions has been the 
largest of any sessions. There is a hun¬ 
ger on the part of the attendees for ex¬ 
perience.” 


of Aldus’s product, Pagemaker, have 
now achieved substantial wealth. 

“One effect of this sudden new wealth 
was impatience with the idea of sitting 
down to hundreds of hours of drudgery to 
document the code,” DeMarco observed. 
No one can tell them they have to, because 
they own the company. At the same time, 
the staff is up to 60 software builders and 
the new people, in particular, need qual¬ 
ity internal documentation. “The average 
new hire required from four to six months 
to become acclimated with his or her as¬ 
signed product subsystem.” 

It turned out that these wealthy devel¬ 
opers were willing to go on camera and 
record what they had done on videotape, 
with 

(1) each video shot in 10-45 minutes, 
one take per topic; 

(2) a small live audience (10-12 
people), made up of other experts 
to ensure accuracy, complete cov¬ 
erage, and good questions; 

(3) a question-and-answer period off 
camera, followed by an on-camera 
recap; 

(4) guest speakers to present some 
subtopics; and 

(5) an overhead projector and white 
boards for graphics and 

point outlines. 

The technique works. “To date, ap¬ 
proximately half of the company’s devel¬ 
opers have viewed the videos,” DeMarco 
said. “Many users reported they viewed 



“The conference has been balanced be¬ 
tween experience reports and research 
papers,” Marie-Claude Gaudel of 
LRI-Universite de Paris-Sud, pro¬ 
gram co-chair, noted. “In five years we 
hope to get experience reports based on 
some of the research reported this 
year.” 


all or some of the tapes more than once.” 

“The designer’s intent was clearer in 
the videos than in any written design 
documentation I’ve ever seen,” one user 
said in response to a survey question¬ 
naire. “Being able to hear in the de¬ 
signer’s own words his point of view, fo¬ 
cus, approaches evaluated and rejected, 
etc., provides a framework in which new 
design decisions can be made.” 

MCC moves toward a collaborative 
process. MCC has grappled with the 
technology-transfer issue since its incep¬ 
tion in 1983. Its success as a research con¬ 
sortium depends upon getting its research 
output into use by its shareholders. The 
Software Technology Program has de¬ 
veloped and refined a number of commu¬ 
nication- and education-oriented meth¬ 
ods for technology transfer, such as re¬ 
view sessions, workshops, and video 
conferences. So its methods are not 
overly simple and not unsophisticated. 

Yet, “in and of themselves, they tend to 
lead to only sporadic use of STP tech¬ 
nologies on real development projects,” 
Babcock, Belady, and Gore reported. 
“Widespread diffusion requires a new 
approach to technology transfer and an 
expanded methodology for achieving it.” 

MCC found the key in “the collabora¬ 
tive model of research — a model pre¬ 
scribed early in STP’s life to increase the 
responsiveness of STP research to share¬ 
holder needs,” the report went on. “In this 
view, STP’s responsibility extends be¬ 
yond the release of STP prototypes, to 
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bridging the gap through what we call 
‘industrialization.’” 

Industrialization means either inte¬ 
grating a new technology into a process 
used by shareholder designers to create 
profit-generating products or turning an 
innovation into a profit-making product 
in its own right. 

“No longer was ‘technology transfer’ 
merely the act of passing on a prototype as 
though it were a baton in a footrace,” the 
report said. “Our original commitment 
had all along necessitated that we join in 
on the latter part of the race as well.” 

The Software Technology Program 
came to understand that one of its tech¬ 
nologies must be not just tossed into the 
air like a homing pigeon, but separated 
from the STP environment and carefully 
fitted into that of the shareholder. That 
working together throughout the entire 
cycle of research and transfer, STP calls 
collaboration. 

Metrics tells you where you are. “If 
you do not know where you have been and 
where you are, then it is difficult to know 
where you are going,” JPL’s Marilyn 
Bush noted in her second experience re¬ 
port. In 1985, the deputy director com¬ 
missioned a software intensive system 
study that led to the System Software and 


CAIA focuses on recent 

Bob Carlson, Staff Editor 

Although negativism and suggestions 
that AI has failed permeate the media, the 
facts differ from the perception. So said 
General Chair Mark Fox of Carnegie 
Mellon University March 7 when he wel¬ 
comed attendees to the Sixth IEEE Con¬ 
ference on Artificial Intelligence Appli¬ 
cations sponsored by the IEEE Computer 
Society in Santa Barbara, California. 

Clearly, the supposed “winter of AI” 
failed to cool the interest of the research¬ 
ers and engineers who gathered March 5- 
9 to hear about the latest developments in 
both AI theory and practice. 

Tutorials and paper sessions were sup¬ 
plemented by invited talks and panels 
covering subjects from robots to Lisp ap¬ 
plications to funding for AI. According to 
Program Committee Chair Se June Hong 
of the IBM T.J. Watson Research Center, 
the 44 accepted papers (out of 192 sub¬ 
mitted) were very high in quality. Of the 
three tracks, Engineering/Manufactur¬ 
ing attracted the most submittals, but 
Business/Decision Support and Enabling 
Technology were well represented. 

Searching large databases. In a ple¬ 


Operations Resource Center in 1986 and 
the Software Product Assurance Section 
in 1987. As JPL improved its software 
processes, more comprehensive and 
standardized metrics became essential. 
One result of good metrics is the ability to 
determine whether a particular transfer 
of technology has been successful in im¬ 
proving productivity and quality. For this 
purpose JPL needed some baseline data. 

“The first objective [of the metrics 
study] was to collect whatever data could 
be found from as many projects for as 
many years as was still available,” Bush 
said. The data collected was source lines 
of code, dollars, workyears, and defects. 
Data was found representing almost 5 
million lines of code, about 30,000 
workmonths, and over $333 million 
worth of work, going back as far as 15 
years. 

“On average, it has cost $1,149 for each 
new line of code for JPL flight systems, 
and JPL has produced approximately 10 
new lines of flight system code per 
workmonth,” Bush reported. “Ground 
systems cost on the average $67 for a new 
line of code, and the lab has. produced 
about 186 lines of ground system code per 
workmonth.” 

These averages are somewhat worse 
than other NASA and US Air Force fig¬ 


ures that Bush cited but, because of the 
difficulties of retroactive data collection, 
the JPL figures probably have a large 
margin of error. 

With respect to quality, “JPL averaged 
about 8.6 defects per thousand lines of 
code for flight software and about 2 for 
ground software,” she stated. She indi¬ 
cated that these results are a lower bound, 
however, as many defects probably did 
not get recorded. 

Despite the difficulties, Bush believes 
that JPL now has a rough measurement 
foundation for assessing productivity 
and quality gains. 

Sponsors, chairs. The IEEE Com¬ 
puter Society Technical Committee on 
Software Engineering, ACM’s Special 
Interest Group on Software Engineering, 
and Association Francaise pour la Cy- 
bemetique Economique et Technique 
sponsored ICSE 12. Francois-Regis Val- 
ette of ONERA-CERT served as general 
chair, and Peter Freeman and Marie- 
Claude Gaudel were the program co¬ 
chairs. 

The proceedings, order No. 2026, is 
available from the Computer Society 
Press, Los Alamitos, California, by call¬ 
ing (800) CS-BOOKS or (714) 821-8380 
in California. 


developments in AI applications 


nary address titled “Massively Parallel 
Text Processing,” David L. Waltz of 
Thinking Machines described the work 
his company is doing on systems for 
searching very large databases, and in the 
process he outlined the architecture of the 
company’s Connection Machine. 

The principle behind the massively 
parallel Connection Machine system 
used in database searches is its reliance 
on both code and data as potential sources 
for parallel speedups. A very large data¬ 
base of documents can be searched for a 
pattern more quickly if many simple 
processors are queried. 

Searching large databases is an appro¬ 
priate application for parallel operation 
because as databases get larger the cost- 
effectiveness of a serial machine falls off, 
Waltz explained. With a parallel ma¬ 
chine, cost remains constant. 

It is helpful to generate fairly large que¬ 
ries, he said, because they can be pro¬ 
cessed very easily and have some highly 
desirable properties. For instance, the 
larger the query, the greater the likeli¬ 
hood of getting a good match, not just an 
apparent match based on a few superfi¬ 


cial words. The goal should be not only 
retrieving the pertinent documents but 
also ordering them so that the most im¬ 
portant are retrieved first. 

The company built a system for Dow 
Jones that has been on line for over a year. 
It includes two 32,000-processor Con¬ 
nection Machines that are completely re¬ 
dundant, each with a data vault attached. 
One unit is the on-line system, which 
does a continuous “dribble,” nonperma¬ 
nent update of the database. The other 
unit serves as a backup and also does the 
permanent database updating off line. 
The database has 300 sources of informa¬ 
tion, consisting primarily of business 
publications. 

Interprocessor communication is just 
one of the major issues in this growing 
field. There is also the separate and diffi¬ 
cult problem of text preparation for 
searching. In the system Waltz described, 
words that aren’t useful are removed, and 
scores are assigned to the remaining 
words. Common words get a low score 
because they appear in many documents; 
unusual words get a higher score because 
they help limit the number of documents 
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Se June Hong of IBM was CAIA 90 Pro¬ 
gram Committee chair. The Computer 
Society sponsored the March 5-9 event. 


that may be of interest. 

The system searches documents in 
chunks down to the 30-word level so that 
larger documents, which naturally con¬ 
tain more of the search words, don’t un¬ 
fairly outweigh shorter documents. The 
search can be refined with subsequent 
terms. Once a good article is found, its 
words can be used to form another query. 
Waltz claimed that a user can learn to 
search with the system in about 30 sec¬ 
onds. 

Software creation for a working sys¬ 
tem's formidable. Waltz said that the 
software the company built and placed in 
commercial use was about a five-person- 
year effort. Approximately 75 percent of 
the code in the system deals with error 
handling and exception handling. 

Currently, systems are limited by the 
machine’s memory size, which is about 8 
gigabytes, or 25 gigabytes allowing for 
compression. The company is prototyp¬ 
ing software to handle larger databases 
and is using an inverted file method to get 
around the memory limitation. 

Waltz foresees a terabyte machine us¬ 
ing a single Connection Machine and 64 
data vaults. He estimates a 3.5-second re¬ 
sponse time on 100 gigabytes of text us¬ 
ing 8,000 processors or on a terabyte of 
text using 64,000 processors. Yet another 
project involves the possibility of auto¬ 
matic hypertext database building. 

Waltz pointed out the connection be¬ 
tween systems of this magnitude and 
memory-based reasoning. Using expert¬ 
like behavior, such systems might com¬ 
pare a new case to a database of classified 
precedents to arrive at, for example, a 
medical diagnosis. The result, in effect, is 
an expert system without the need for a 
knowledge engineer. And without rules 
to change, such a system could be more 
easily updated through the addition or de¬ 
letion of cases. 



David Waltz of Thinking Machines was 
the plenary speaker March 8 at the con¬ 
ference in Santa Barbara, California. 


But, according to Waltz, AI research¬ 
ers don’t seem to be thinking in terms of 
larger systems. Instead he sees them con¬ 
centrating on running applications on the 
smallest machines possible. “Why aren’t 
AI people in the forefront of using large 
machines?” he asked. 

Analyzing business activities. In an 
invited talk on modeling and analyzing 
businesses, Richard Fikes, director of the 
knowledge-based systems program for 
Price Waterhouse, described a hypotheti¬ 
cal multipurpose “business understander 
system.” He sketched a scenario in which 
auditors would use the system to access 
information as they prepared for an on¬ 
site client visit. The system might, for ex¬ 
ample, locate articles from the trade press 
to provide background. 

One of the system’s capabilities would 
be to store and analyze data on industry 
norms, thus enabling it to point out 
anomalies in similar businesses and help 
auditors spot potential problems for 
clients. The system would go on to ana¬ 
lyze data on a client and even suggest spe¬ 
cific questions for the audit team to ask of 
particular people in the client’s organiza¬ 
tion. It might also offer advice aimed at 
bettering the company’s operating meth¬ 
odology. 

A business modeler should be able to 
deal with knowledge in three forms: fi¬ 
nancial models, operational models, and 
organizational models. AI is the logical 
technology, Fikes said, because it allows 
more sophisticated modeling and is bet¬ 
ter able to handle qualitative reasoning. 

AI systems need activity descriptions 
as input for comparing one activity 
against another, Fikes said. Each activity 
has expected results, enabling condi¬ 
tions, and triggering conditions. 

Groundwork for business-modeling 
and business-analysis, systems was laid 



Richard Fikes of Price Waterhouse de¬ 
livered an invited talk during the 
March 8 session of the annual event. 


by the Robotics Institute at CMU, accord¬ 
ing to Fikes. Efforts are continuing at the 
Price Waterhouse Technology Centre in 
Menlo Park, California, where Walter 
Hamscher is working on Crosby, a sys¬ 
tem that checks dependencies and gener¬ 
ates alternative explanations. The center, 
which builds prototypes of next-genera¬ 
tion systems, uses some commercially 
available tools but is also developing its 
own internal tool system in order to have 
the source code necessary to make ongo¬ 
ing changes. 

Fikes predicts a high economic payoff 
from business-modeling and business- 
analysis systems over a wide range of 
business activities. But there remains a 
collection of hard challenges for AI re¬ 
searchers as they look for ways to inter¬ 
pret incoming data and better meet the 
needs of business. 

In an aside during the question-and- 
answer session, Fikes mentioned an ef¬ 
fort involving DARPA, the National Sci¬ 
ence Foundation, and the Office of Scien¬ 
tific Research that he called “a very excit¬ 
ing beginning” in the search for ways to 
share knowledge bases. One obvious 
need is for standards for accessing data 
and for communicating knowledge from 
one system to another. Fikes had taken 
part in a workshop in which participants 
discussed development of the necessary 
standards and sought to determine what 
other issues needed to be addressed. 
Committees were formed to work on the 
standards, and Fikes expects them to is¬ 
sue documents later in the year. 

Close encounters with AI. Though 
unable to agree on the extent to which the 
public currently comes into contact with 
AI, members of a panel convened to dis¬ 
cuss the topic viewed such human-com¬ 
puter interaction as inevitably more com¬ 
mon in the future. They also touched on 
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some of the social-impact issues result¬ 
ing from ongoing developments in AI, 
thanks largely to the efforts of Session 
Chair Gary Kahn of the Carnegie Group 
to keep the discussion on track. 

Anatol Gershman of Anderson Con¬ 
sulting stated one of these issues suc¬ 
cinctly: “AI tools can help expand our ca¬ 
pabilities and thus improve our control of 
our environment. They also expand the 
capabilities of people who are trying to 
control us.” 

It would be hard to find a better illustra¬ 
tion of Gershman’s statement than the 
Autopolis project described by panelist 
John Harper of University College Dub¬ 
lin. The main purpose of this European 
program is to advance traffic science by 
making it easier to detect and deter traffic 
violations. 

Under the system Harper described, 
automobiles would be equipped with 
transponders, and entrance points to 
roadways would have ID sensors. In ef¬ 
fect, the roads would become an elec¬ 
tronic mesh able to monitor individual 
automobiles. Strategically located bea¬ 
cons connected serially and to a main 
computer would function like police 
cars. 

Harper predicts a greater psychologi¬ 
cal effect on drivers because there would 
be less time between detection of a viola¬ 
tion and prosecution. The concept has 
been viewed favorably by police depart¬ 
ments, he said, but not by drivers. The 
problem, as Harper sees it, is that “they 


There is growing recognition of the 
importance of two related challenges for 
database systems as we enter the 1990s. 
One is to exploit the potential for flexibil¬ 
ity of access to dispersed data that may 
preexist at nodes of a computer network 
or are distributed for some other reason. 
The other is to meet the often very strin¬ 
gent performance requirements for some 
database queries by invoking the support 
of parallel architectures based on recent 
hardware and software developments. 

The Second International Symposium 
on Databases in Parallel and Distributed 
Systems, slated July 2-4 in Dublin, Ire¬ 
land, is being organized to provide a fo¬ 
rum for database researchers and practi¬ 
tioners to share results of research into 
new programming paradigms, data mod¬ 
els, and database systems designed to 
harness parallelism and distribution in 
data-intensive applications. The sympo¬ 


need to be educated a little more.” He con¬ 
tends that the goals of the Autopolis proj¬ 
ect justify the perceived invasion of pri¬ 
vacy. 

Gershman raised the issue of who intel¬ 
ligent agents actually serve. For ex¬ 
ample, he foresees a merger of television 
and computer technology that will place 
multimedia devices in the average house¬ 
hold. One thing that’s sure to happen with 
them, he said, is that “somebody will try 
to sell us something.” In other words, 
whose agent is the device? 

The third panelist, Rudy Estrada, is 
with the Internal Revenue Service’s 
international unit. He has been working 
on expert systems as well as on a compli¬ 
ance project with neural networks that 
involves taxpayers living in other coun¬ 
tries. Although the IRS is entering the 
area of machine learning, most of the 
work done in its AI lab is in expert sys¬ 
tems. They are field testing an expert sys¬ 
tem to aid auditors in determining which 
returns are most likely to result in produc¬ 
tive audits. 

Although several IRS expert systems 
focus on compliance, others are intended 
to provide direct service to the taxpayer. 
Turnover among representatives who an¬ 
swer phone inquiries has severely limited 
“institutional memory,” so an expert sys¬ 
tem now helps front-line people answer 
taxpayers’ questions. The current system 
is two-tiered so that it can provide simple 
answers or more complete explanations, 
depending on the varying needs of, say. 


sium is sponsored by the IEEE Computer 
Society, ACM SIGArch, the British 
Computer Society, the Irish Computer 
Society, and the US Office of Naval Re¬ 
search. 

Jane Grimson of Trinity College and 
Sushil Jajodia of George Mason Univer¬ 
sity are the general co-chairs, and Rakesh 
Agrawal of IBM Almaden Research Cen¬ 
ter and David Bell of the Institute of In¬ 
formatics are the program co-chairs. 

The symposium program will feature a 
single track to ensure high quality in tech¬ 
nical content and a high degree of techni¬ 
cal interaction during the paper sessions. 
Eighteen papers have been selected for 
presentation from the more than 80 sub¬ 
mittals from authors in 13 countries. 

Gene Lowenthal of Cooperative Com¬ 
puting will deliver the keynote address, 
entitled “The Market Environment for 
Database Machine and Servers.” Umesh 


individuals versus corporations. 

Estrada thinks expert systems will gain 
acceptance among taxpayers because 
they will be perceived as fair and impar¬ 
tial. This perception of uniformity and 
fairness was one of the key issues cited by 
Gary Kahn, panel moderator. 

Also important with the proliferation 
of intelligent systems are questions of re¬ 
sponsibility and liability. Gershman ex¬ 
pects various AI products to have differ¬ 
ent levels of assumed liability that will 
have to be known up front. For example, a 
client expects a different degree of risk 
when placing money with a stockbroker 
than when entrusting it to an accountant. 

Gershman specified some of the major 
design issues that will affect the success 
of AI products in the marketplace. In ad¬ 
dition to knowledge acquisition, there is 
the burden of verification and validation, 
which he said depends largely on system 
specification. It is important for users and 
designers to have similar expectations 
regarding a system’s capabilities. Fur¬ 
thermore, for its products to gain accep¬ 
tance by the general public, industry must 
be prepared to build complex systems for 
heterogeneous users, with little or no 
training required. 

The conference proceedings can be ob¬ 
tained by specifying Order No. 2032 
when contacting the Computer Society 
Press, 10662 Los Vaqueros Cir., Los 
Alamitos, CA 90720, phone (800) CS- 
BOOKS. In California, dial (714) 821- 
8380. 


in database systems 


Dayal of Digital Equipment is organizing 
a panel on “Key Directions in Parallelism 
and Distribution in Database Systems.” 

Four half-day tutorials will be offered 
during the symposium. They are entitled 
“Parallel Database Systems,” to be pre¬ 
sented by Andreas Reuter of Stuttgart 
University; “Heterogeneous Multidata¬ 
base Systems,” by Witold Litwin of 
INRIA and Marek Rusinkiewicz of the 
University of Houston; “Performance 
Evaluation and Modeling of Distributed 
Database Management Systems,” by 
Miron Livny of the University of Wiscon¬ 
sin; and “Parallelism in Logic Data¬ 
bases,” by Doug DeGroot of Texas In¬ 
struments. 

For additional information, contact 
Sushil Jajodia, ISSE Department, 

George Mason University, Fairfax, VA 
22030-4444, phone (703) 764-6192, In¬ 
ternet jajodia@gmuvax2.gmu.edu. 


Symposium to focus on parallelism and distribution 

Rakesh Agrawal, IBM Almaden Research Center 
David Bell, Institute of Informatics, Northern Ireland 
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CALL FOR PAPERS 


Fifth International Parallel Processing Symposium 

March 27-29, 1991 

Newport Beach Marriott Hotel & Tennis Club, CA 


This symposium is a forum for engineers and scientists throughout the world to present and exchange 
current work in all aspects of parallel and distributed processing. The symposium will consist of invited 
and submitted paper sessions, poster sessions, tutorials and commercial exhibits. Authors are invited to 
submit papers in all areas of parallel processing. Papers should demonstrate original unpublished research 
in parallel processing including development of experimental or commercial systems. Topics of interest 
include but are not limited to: 


Parallel Architectures 
Parallel & Distributed Algorithms 
Performance Modeling & Evaluation 
Optical Computers 
Special Purpose Processors 
Neural Networks 


• Interconnection Networks 

• Signal & Image Processing Systems 

• Fault Tolerance 

• Parallel Languages 

• Parallelizing Compilers 

• Application Specific Systems 


Send four copies of complete paper (not to exceed 15 single spaced pages) or a 1000 word summary to 
Program Chair. Please include Fax number and e-mail address. 

Professor V. K. Prasanna Kumar 
Dept, of Electrical Engineering-Systems, SAL 344 
University of Southern California 
Los Angeles, CA 90089-0781 
Tel: (213) 743-5236, Fax: (213) 745-7284 
E-mail: ipps@ashoka.usc.edu 

Papers or Summaries must be received by September 15, 1990. Notification of review decisions will be 
mailed by December 15. 1990 . Camera ready papers are due February 1. 1991 . Conference proceedings 
will be available at the symposium. Selected papers will be published in a dedicated issue of the Journal of 
Parallel and Distributed Computing. The symposium is sponsored by the Orange County Chapter of the IEEE 
Computer Society. 


CONFERENCE CHAIRMAN 


Larry Canter, Computer Systems Approach, Inc. 


PROGRAM COMMITTEE 


V.K. Prasanna Kumar, USC (Program Chair) 


Prith Banerjee, Univ. of Illinois 

Jacob Barhen, JPL & Caltech 

Keith Bromley, N0SC 

Mary Eshaghian, Grumman Data Systems 

Jose Fortes, Purdue University 

Kai Hwang, USC 

S. S. Iyengar, LSU 


H. T. Kung, CMU 

Dennis Parkinson, Queen Mary College & AMT 

A. Paulraj, BEL & CDAC 

K. Wojtek Przytula, Hughes 

Sartaj Sahni, Univ. of Minnesota 

H. J. Siegel, Purdue University 

AbWaksman.AFOSR 


Commercial Sponsors: Hughes Aircraft Co., Intel Corp., JPL, Motorola Corp., 

Rockwell International, Toshiba Corp. 


^ IEEE COMPUTER SOCIETY 
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THE INSTITUTE OF ELECTRICAL AND 
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CALL FOR PAPERS 


AIDA 90, Sixth Conf. on Artificial Intelli¬ 
gence and Ada: Nov. 15-16, 1990, Reston, Va. 
Sponsors: George Mason Univ. et al. Submit 
four copies of complete paper by June 29, 

1990, to AIDA 90, Computer Science Dept., 
George Mason Univ., 4400 University Dr., 
Fairfax, VA 22030, phone (703) 323-2713, fax 
(703) 323-2630, e-mail aida@gmuvax.gmu. 


tWRt 1990 IEEE Workshop on Languages 
Nftz and Architectures for Automation: 

Dec. 19-21, 1990, Honolulu, Hawaii. Spon¬ 
sors: Pacific Int’l Center for High Technology 
Research et al. Submit five copies of paper by 
June 30, 1990, to D.Y.Y. Yun, Univ. of Ha¬ 
waii, 711 Kapiolani Blvd., Suite 200, Hono¬ 
lulu, HI 96813-5249, phone (808) 539-1532, 
fax (808) 941-1399. 


IAPR Workshop on Machine Vision Appli¬ 
cations: Nov. 28-30, 1990, Tokyo. Sponsor: 
Int’l Assoc, for Pattern Recognition. Submit 
four copies of summary by June 30,1990, to 
Mikio Takagi, Inst, of Industrial Science, 
Univ. of Tokyo, 7-22-1 Roppongi, Minatoku, 
Tokyo 106, Japan, phone 81 (3) 479-0289, fax 
81 (3) 423-2834, e-mail takagi@tkl.iis. 
u-tokyo.ac.jp. 


IFIP Working Conf. on Modeling in Com¬ 
puter Graphics: Apr. 8-12, 1991, Tokyo. 
Sponsor: IFIP TC 5/WG 5.10. Submit five cop¬ 
ies of paper by June 30,1990, to Tosiyasu L. 
Kunii, Information Science Dept., Faculty of 
Tokyo, Univ. of Tokyo, 7-3-1 Hongo, Bunkyo- 
ku, Tokyo 113, Japan, phone 81 (3) 816-1783, 
fax 81 (3) 818-4607, e-mail b39756@tansei. 
cc.u-tokyo.ac.jp. 


Seventh Int’l Conf. on Data Engineer- 
ing: Apr. 8-12, 1991, Kobe, Japan. Sub¬ 
mit five copies of paper by July 1,1990, to Nick 
J. Cercone, Center for Systems Science, Simon 
Fraser Univ., Burnaby, B.C., Canada, V5A 
1S6, (604) 291-3229, e-mail nick@cs.sfu.ca. 


IEEE Trans. Computers plans a special 
issue on protocol engineering. Submit 
six copies of manuscript by July 1,1990, to 
Ming T. (Mike) Liu, CIS Dept., Ohio State 
Univ., 2036 Neil Ave., Columbus, OH 43210- 
1277, phone (614) 292-1837, e-mail liu@cis. 
ohio-state.edu. 


1990 Fall VHDL Users’ Group Meeting: Oct. 
14-17, 1990, Oakland, Calif. Submit abstract 
by July 1, 1990, to David Barton, Intermetrics, 
4733 Bethesda Ave., Bethesda, MD 20814, 
phone (301) 657-3775; or Doug Perry, Synop¬ 
sis, 1098 Alta Ave., Mountain View, CA 
94943, phone (415) 962-5000. 

Third IEE Conf. on Telecommunications: 

Mar. 17-20, 1991, Edinburgh, Scotland. Spon¬ 


sor: Inst, of Electrical Engineers. Submit sum¬ 
mary by July 2, 1990, to Conf. Services, IEE, 
Savoy Place, London WC2R 0BL, UK, phone 
44 (71) 240-1871, fax 44 (71) 240-7735. 

10th IEEE Int’l Phoenix Conf. on 
Computers and Communications: Mar. 27- 
30, 1991, Scottsdale, Ariz. Sponsors: IEEE, 
IEEE Communications Society. Submit five 
copies of abstract and complete paper by July 
14, 1990, to James A. Weeldreyer, Honeywell, 
Industrial Automation Systems Div., MS 2E5, 
16404 N. Black Canyon Hwy., Phoenix, AZ 
85023, phone (602) 863-5983, e-mail jw- 
ipccc@enuxha.eas.asu.edu. 

Fourth CSI/IEEE Int’l Symp. on VLSI 
Design: Jan. 5-8, 1991, New Delhi. 
Sponsors: Computer Society of India et al. 
Submit six copies of paper by July 15,1990, to 
Lalit M. Patnaik, Computer Science and Auto¬ 
mation, Indian Inst, of Science, Bangalore 
560012, India, phone 91 (0812) 342-451, e- 
mail !uunet! shakti!turing!lalit@shakti.uu. 
net; or Adit D. Singh, Electrical and Computer 
Engineering Dept., Univ. of Massachusetts, 
Amherst, MA 01003, phone (413) 545-0188, e- 
mail singh@ecs.umass.edu. 

Third UNB Artificial Intelligence Work¬ 
shop: Oct. 1, 1990, Fredericton, N.B., Canada. 


Sponsor: Univ. of New Brunswick. Submit 
four copies of extended abstract by July 15, 
1990, to B.G. Nickerson, School of Computer 
Science, Univ. of New Brunswick, PO Box 
4400, Fredericton, N.B., Canada E3B 5A3, 
phone (506) 453-4566, fax (506) 453-3566, e- 
mail bgn@unb.ca. 

Meiecon 91, Fifth Mediterranean Electro¬ 
technical Conf.: May 22-24, 1991, Ljubljana, 
Yugoslavia. Cosponsors: IEEE Region 8 Yu¬ 
goslavia Section et al. Submit five copies of 
summary by July 15,1990, to Meiecon 91 Sec¬ 
retariat, Fakulteta za elektrotehniko, Trzaska 
25, 61001 Ljubljana, Yugoslavia, phone 38 
(61) 265-161, fax 38 (61) 264-990. 

28th Allerton Conf. on Communication, 
Control, and Computing: Oct. 3-5, 1990, 
Monticello, Ill. Submit extended abstract for 
regular paper and summary for short paper by 
July 16, 1990, to Allerton Conf., c/o Donna J. 
Brown, Univ. of Illinois at Urbana-Cham- 
paign, Coordinated Science Lab, 1101 W. 
Springfield, Ave., Urbana, IL 61801, phone 
(217) 244-0581, e-mail djb@uicsl.csl.uiuc. 
edu. 

Electrosoft plans a special issue on software 
for electrical engineering education. Pub¬ 
lisher: Computational Mechanics Publica- 


Call for papers and referees for Computer 

Computer seeks articles for inclusion in an upcoming special issue. Com- 
puter Generated Music has been selected as the theme for the July 1991 
edition. The issue will be devoted to examining the driving forces in the field from a 
computational standpoint, assessing the limits of computer music in the general 
field of music, and discussing future desirable directions. See the April 1990 issue 
of Computer (p. 127) for complete information. 

Abstracts are due by August 30,1990, and four copies of the full manuscript and 
four audio cassettes are due by October 30,1990. Notification of acceptance is set 
no later than December 31,1990, and the final version of the manuscript is due no 
later than March 30, 1991. 

For submittal to Computer, manuscripts must not have been previously pub¬ 
lished or currently submitted for publication elsewhere. Each manuscript should be 
no more than 32 typewritten, double-spaced pages long, including all figures and 
references. Each submittal should have a cover page that contains the title of the 
article, the full name(s) and affiliation(s) of the author(s), complete postal and elec¬ 
tronic address(es), telephone number(s), a 300-word abstract, and a list of key¬ 
words identifying the central issues of the manuscript’s contents. The final manu¬ 
script should have approximately 7,500 words and no more than 12 references. 

Submissions should be sent to Denis Baggi, Istituto Dalle Molle per Studi sull’ In- 
telligenza Artificiale, Corso Elvezia 36, 6900 Lugano, Switzerland, phone 41 (91) 

56 15 78, Europe e-mail denis%idsia.uucp@chx400.switch.ch, US e-mail baqqi@ 
berkeley.edu. 

If you are willing to review articles, please send a note to Denis Baggi or Bruce 
Shriver, editor-in-chief of Computer, with a list of your technical interests and quali¬ 
fications. Shriver's address is Bruce D. Shriver, University of Southwestern Louisi¬ 
ana, PO Drawer 42730, Lafayette, LA 70504-2730, phone (318) 231-5811 fax 
(318) 265-5472, Internet shriver@usl.edu. 
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tions. Submit paper by July 22, 1990, to D.J. 
Glover, Electrical and Computer Engineering 
Dept., 409 Dana Bldg., Northeastern Univ., 
Boston, MA 02115, phone (617) 437-3007. 

Int’l J. Computer-Aided VLSI Design plans a 
special early-1991 issue on VLSI testing. Pub¬ 
lisher: Ablex. Submit five copies of complete 
manuscript by July 31, 1990, to Sunil R. Das, 
Electrical Engineering Dept., Faculty of Engi¬ 
neering, Univ. of Ottawa, Ottawa, Ont., Can¬ 
ada KIN 6N5, phone (613) 564-3374, fax 
(613) 564-7681, e-mail das@uotelg01 or Bit- 
net srdpb@uottawa. 

Workshop on Parallel and Distributed 
Simulation: Jan. 21-23, 1991, Anaheim, Calif. 
Submit six copies of full paper by July 31, 

1990, to Richard M. Fujimoto, School of Infor¬ 
mation and Computer Science, Georgia Inst, of 
Technology, Atlanta, GA 30332, phone (404) 
853-9384, e-mail fujimoto@prism.gatech. 
edu. 

SEARCC 90, South East Asia Regional 
Computer Confederation Conf.: Dec. 4-8, 
1990, Manila. Sponsor: Philippine Computer 
Society. Submit draft by July 31, 1990, to Vic¬ 
tor B. Gruet, Computer Information Systems, 
CIS Bldg., Meralco Compound, Ortigas Ave„ 
1602 Pasig, Metro Manila, Philippines, phone 
(632) 722-1251, fax (632) 722-0141. 

IEEE Software plans a special issue in 

March 1991 on testing and debugging. 
The issue will review the status of the two areas 
and present state-of-the-art techniques. Sub¬ 
mit eight copies of article by Aug. 15,1990, to 
Carl K. Chapg, EECS Dept., Univ. of Illinois, 
Box 4348, Chicago, IL 60680, phone (312) 
996-4860, fax (312) 413-1386, Compmail+ 
c.chang, e-mail ckchang@uicbert.eecs.uic. 


Second European Distributed Memory 
Computing Conf.: Apr. 22-24, 1991, Munich, 
West Germany. Cosponsors: Gesellschaft fur 
Informatik et al. Submit paper by Aug. 15, 
1990, to Arndt Bode, Computer Science, Tech- 
nische Univ. Munich, POB 20-24-20, D-8000 
Munich 2, Federal Republic of Germany, 
phone 49 (89) 2105-8240, e-mail bode@ 
infovax.informatik.tu-muenchen.dbp.de. 

Int’l Workshop on Network and Operating 
System Support for Digital Audio and 
Video: Nov. 8-9, 1990, Berkeley, Calif. Spon¬ 
sor: Int’l Computer Science Inst. Submit ab¬ 
stract by Aug. 15,1990, to Ramesh Govindan, 
ICSI, 1947 Center St., Suite 600, Berkeley, CA 
94704-1105, phone (415) 642-4274, ext. 136, 
e-mail av-workshop@berkeley.edu. 

Int’l J. Computer-Aided VLSI Design plans a 
special issue on VLSI/systolic arrays. Pub¬ 
lisher: Ablex. Submit five copies of full papers 
by Aug. 30,1990, to Bijan Karimi, Electrical 
and Computer Engineering Dept., Univ. of 
New Haven, West Haven, CT 06516, phone 
(203) 932-7164. 

£3^ ETC 91, 1991 European Test Conf.: 

'5*7 Apr. 17-19, 1991, Munich, West Ger¬ 
many. Sponsor: VDE (Zentralstelle Tagungen 
und Seminare). Submit four copies of abstract 
or full paper by Aug. 31,1990, to ETC 91, c/o 


Bennetts Associates, Burridge Farm, Bur- 
ridge, Southampton S03 7BY, UK, fax (44) 
489-579519. 


CAIA 91, Seventh IEEE Conf. on Arti- 
KSP' ficial Intelligence Applications: Feb. 
24-28, 1991, Miami Beach, Fla. Submit paper 
by Aug. 31,1990, to Tim Finin, Center for Ad¬ 
vanced Information Technology, Unisys, 70E 
Swedesford Rd„ PO Box 517, Paoli, PA 19301, 
phone (215) 648-2840, fax (215) 648-2288, e- 
mail finin@prc.unisys.com. 

Electrosoft plans a special issue on software 
for system transient modeling. Publisher: 
Computational Mechanics Publications. Sub¬ 
mit paper by Sept. 2,1990, to H.W. Dommel, 
Electrical Engineering Dept., Univ. of British 
Columbia, 2356 Main Hall, Vancouver, B.C., 
Canada V6T 1W5, phone (604) 228-2793. 

EDAC 91, European Design Automation 
Conf.: Feb. 25-28, 1991, Amsterdam. Cospon¬ 
sor: IEEE. Submit paper by Sept. 3,1990, to 
Secretariat, EDAC 91, CEP Consultants, 26- 
28 Albany St., Edinburgh EH1 3QH, Scotland, 
phone 44 (31) 557-2478, fax 44 (31) 557-5749. 

CG Int’l 91: June 22-28, 1991, Cambridge, 
Mass. Cosponsors: Computer Graphics Soci¬ 
ety, MIT. Submit six copies of summary by 
Sept. 4,1990, and six copies of full paper by 
Nov. 5,1990, to N.M. Patrikalakis, MIT Rm. 5- 
428, 77 Massachusetts Ave., Cambridge, MA 
02139, phone (617) 253-4555, fax (617) 253- 
8125, e-mail nmp@deslab.mit.edu. 


£3^, ICSE 13,13th Int’l Conf. on Software 
*^57 Engineering: May 13-16, 1991, Austin, 
Texas. Cosponsor: ACM. Submit eight copies 
of paper by Sept. 14,1990, to David Barstow, 
Schlumberger Lab for Computer Science, PO 
Box 200015, Austin, TX 78720-0015. 


First IEEE Int’l Workshop on Inter¬ 
sil operability in Multidatabase Systems: 

Apr. 8-9, 1991, Kyoto, Japan. Submit seven 
copies of extended abstract by Sept. 15,1990, 
to Marek Rusinkiewicz, Univ. of Houston, 
Computer Science Dept., Houston, TX 77204- 
3475, phone (713) 749-4791, e-mail marek@ 
cs.uh.edu; or Yahiko Kambayashi, Kyushu 
Univ., Computer Science amd Computer Engi¬ 
neering Dept., Hakozaki, Fukuoka 812, Japan, 
fax 81 (92) 641-1825, e-mail yahiko@csce. 
kyushu-u.ac.jp. 

CCW 91, Third IEEE Conf. on Com- 
s57 puter Workstations: May 15-17, 1991, 
Cape Cod, Mass. Sponsor: IEEE Technical 
Committee on Operating Systems. Submit five 
copies of paper by Sept. 15,1990, to Keith 
Marzullo, Computer Science Dept., Upson 
Hall, Cornell Univ., Ithaca, NY 14853. 

Second Int’l Symp. on Database Sys- 

terns for Advanced Applications: Apr. 
2-4, 1991, Tokyo. Sponsor: IPSJ. Submit three 
copies of full paper by Sept. 15,1990, to Akifu- 
imi Makinouchi, Computer Science and Com¬ 
munication Engineering Dept., Kyushu Univ., 
Hakozaki 6-10-1, Fukuoka 812, Japan, phone 
81 (92) 641-1101, ext. 6055, fax 81 (92)641- 
1101, ext. 5418, e-mail akifumi@vax88.csce. 
kyushu-u.ac.jp. 


RTA 91, Fourth Int’l Conf. on Rewriting 
Techniques and Applications: Apr. 10-12, 
1991, Como, Italy. Sponsors: State Univ. of 
Milan. Submit 10 copies of extended abstract 
or full paper by Sept. 15,1990, to Ronald V. 
Book, Theoretische Informatik, Inst, fur Infor¬ 
matik, Univ. Wurzburg, Am Hubland, D-8700 
Wurzburg, West Germany, US phone (805) 
961-2778, e-mail book%henri@hub.ucsb. 
edu. 


Fifth Parallel Processing Symp.: Mar. 27-29, 
1991, Newport Beach, Calif. Submit five cop¬ 
ies of paper by Sept. 15,1990, to Larry H. Can¬ 
ter, Computer Systems Approach, 1140 S. 
Raymond Ave., Suite B, Fullerton, CA 92631. 

1991 IEEE Int’l Conf. on Robotics and Auto¬ 
mation: Apr. 7-12, 1991, Sacramento, Calif. 
Sponsor: IEEE Robotics and Automation Soci¬ 
ety. Submit four copies of paper by Sept. 16, 
1990, to T.J. Tam, Systems Science and Math¬ 
ematics, Campus Box 1040, Washington 
Univ., St. Louis, MO 63130. 


Fourth Int’l Conf. on Industrial and 
VS7 Engineering Applications of Artificial 
Intelligence and Expert Systems: June 2-5, 
1991, Kauai, Hawaii. Sponsors: ACM et al. 
Submit four copies of extended abstract by 
Oct. 1,1990, to Jim Bezdek, Computer Science 
Div., Univ. of West Florida, Pensacola, FL 
32514, phone (904) 474-2784, fax (904) 474- 
2096, e-mail jbezdek@uwf.bitnet. 

CHI 91,1991 Conf. on Human Factors 

in Computing Systems: Apr. 28-May 2, 
1991, New Orleans. Sponsor: ACM. Submit 
six copies of abstract/paper by Oct. 1,1990, to 
Peter Poison, Psychology Dept., Univ, of 
Colorado, Muenzinger Hall, Campus Box 345, 
Boulder, CO 80309-0345, phone (303) 492- 
5622, e-mail ppolson@clipr.colorado.edu. 


1991 IEEE Int’l Symp. on Information The¬ 
ory: June 23-29, 1991, Budapest, Hungary. 
Submit short paper by Oct. 1,1990, and long 
paper by Nov. 1,1990, to Anthony Ephrem- 
ides, Electrical Engineering Dept., Univ. of 
Maryland, College Park, MD 20742, phone 
(301) 454-6871, e-mail tony@eng.umd.edu. 


ISCAS 91,24th IEEE Int’l Symp. on Cir¬ 
cuits and Systems: June 11-14, 1991, Singa¬ 
pore. Sponsor: IEEE Circuits and Systems So¬ 
ciety. Submit six copies of summary by Oct. 1, 
1990, to Technical Program Chair, ISCAS 91 
Secretariat, Communication Int’l Associates, 
44/46 Tanjong Pagar Rd., Singapore 0208, 
phone (65) 226-2823, fax (65) 226-2877. 


Advanced Research in VLSI Conf.: Mar. 25- 
27, 1991, Santa Cruz, Calif. Submit five copies 
of draft paper by Nov. 1,1990, to Carlo H. Se¬ 
quin, Univ. of California, CS Div., 529B Evans 
Hall, Berkeley, CA 94720. 


SCM 3, Third Int’l Workshop on Soft- 
^§7 ware Configuration Management: 

June 12-14, 1991, Trondheim, Norway. Co¬ 
sponsors: ACM et al. Submit four copies of po¬ 
sition paper and full paper by Nov. 15,1990, to 
Peter Feiler, Software Engineering Inst., Car¬ 
negie Mellon Univ., Pittsburgh, PA 15213- 
3890, phone (412) 268-7790, e-mail phf@sei. 
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June 1990 

Fourth Int’l Conf. on Data Communication 
Systems and Their Performance, June 20- 

22, Barcelona. Cosponsors: IFIP et al. Contact 
Barcelona Relaciones Publicas, c/o Pau Claris, 
138-7.4, Edificio Layetana, E-08009 Barce¬ 
lona, Spain, phone (3) 215-7214, fax (3) 215- 
7287. 

Second Int'l Conf. on Software Engineering 
and Knowledge Engineering, June 21-23, 

Skokie, Ill. Sponsors: Knowledge Systems 
Inst., Univ. of Pittsburgh, and Inst, for Infor¬ 
mation Industries, Taiwan. Contact Shi-Kuo 
Chang, Computer Science Dept., Univ. of 
Pittsburgh, 322 Alumni Hall, Pittsburgh, PA 
15260, phone (412) 624-8490. 

ACM Tutorial Week 90, June 21-23, Los An¬ 
geles. Contact Dave Oppenheim, Abacus Pro¬ 
gramming, 14545 Victory Blvd., Van Nuys, 
CA 91411, phone (818) 785-8000. 

DAC 90, 27th ACM/IEEE Design 
'5*7 Automation Conf., June 24-29, 

Orlando, Fla. Contact Pat Pistilli, MP Associ¬ 
ates, 7490 Clubhouse Rd., Suite 102, Boulder, 
CO 80301, phone (303) 530-4333. 

NECC 90, 11th Nat’l Educational Comput¬ 
ing Conf., June 25-27, Nashville, Tenn. Spon¬ 
sor: Int’l Council for Computers in Education. 
Contact John D. McGregor, Computer Studies 
Dept., Murray State Univ., Murray, KY 42071, 
phone (502) 762-2614. 

ROV 90, Remotely Operated Vehicle Conf., 
June 25-27, Vancouver, B.C., Canada. Co¬ 
sponsors: British Columbia Section of the Ma¬ 
rine Technology Society et al. Contact J.S. 
Collins, Electrical and Computer Engineering 
Dept., Univ. of Victoria, Victoria, B.C., Can¬ 
ada, phone (604) 721-8684, fax (604) 721- 
8676. 


Int’l Symp. on Fuzzy Approach to Rea- 
'5*7 soning and Decision Making, June 25- 

29, Bechyne, Czechoslovakia. Cosponsor: 

Int’l Fuzzy System Assoc. Contact Vilem No¬ 
vak, Minin Inst., Czechoslovakia Academy of 
Sciences, A. Rimana 1768, 70800 Ostrava- 
Poruba, Czechoslovakia. 

CGI 90, Computer Graphics Int’l 
*5*7 1990, June 25-29, Kent Ridge, Singa¬ 
pore. Sponsors: Computer Graphics Society, 
Inst, of Systems Science, Singapore. Contact 
Juzar Motiwalla, CGI 90, ISS, Nat’l Univ. of 
Singapore, Heng Mui Keng Terr., Kent Ridge, 
Singapore 0511, phone (65) 772-2751; or Vic- 
torine Toh, Inst, of Systems Science, Nat’l 
Univ. of Singapore, Kent Ridge, Singapore 
0511, phone (65) 772-2003, fax (65) 778-2571. 


In the accompanying Calendar, the IEEE Computer Society logo identifies 
*5^ the conferences the society is sponsoring and participating in. Other confer¬ 
ences of interest to our readers, as well as their sponsors, are also listed. 

For inclusion in Call for Papers or Calendar, submit information at least six 
weeks before the month of publication (i.e., for the August 1990 issue, send in¬ 
formation for receipt by June 15,1990) to Chuck Governale, Calendar Dept., Com¬ 
puter, PO Box 3014, Los Alamitos, CA 90720-1264. 


Advanced Research Workshop on 3D Imag¬ 
ing in Medicine, June 25-29, Travemuende, 
West Germany. Sponsor: NATO. Contact 
Linda Houseman, Computer Science Dept., 
Univ. of North Carolina, Box 3175, Sitterson 
Hall, Chapel Hill, NC 27599, phone (919) 962- 
1758 (for the Americas); or Andreas Pommert, 
Inst, fur Mathematik und Datenverarbeitung in 
der Medizin, Univ. Krankenhaus Eppendorf, 
Martinistrasse 52, 2000 Hamburg 20, Federal 
Republic of Germany, phone 49 (40) 468-2300 
(for Europe, Asia, Australia, and Africa). 

EKAW 90, Fourth European Knowledge 
Acquisition for Knowledge-Based Systems 
Workshop, June 25-29, Amsterdam. Contact 
John H. Boose, Advanced Technology Center, 
Boeing Computer Services 7L-64, PO Box 
24346, Seattle, WA 98124, phone (206) 865- 
3253. 


First Int’l Conf. on Expert Planning Sys¬ 
tems, June 27-29, Brighton, U.K. Sponsor: 
IEE. Contact Conf. Services, Institution of 
Electrical Engineers, Savoy Place, London 
WC2R 0BL, UK, phone 44 (71) 240-1871, fax 
44 (71) 240-7735. 

Fifth Rocky Mountain Conf. on Artifi¬ 
cial Intelligence, June 28-30, Las 

Cruces, N.M. Cosponsors: ACM et al. Contact 
Paul McKevitt, Computing Research Lab, 
Dept. 3CRL, Box 30001, New Mexico State 
Univ., Las Cruces, NM 88003-0001, phone 
(505) 646-5109, fax (505) 646-6218, Internet 
paul@nmsu.edu. 


July 1990 


^3^ FTCS 20, 20th Int’l Symp. on Fault- 
*5*7 Tolerant Computing, June 26-28, 

Newcastle upon Tyne, England. Cosponsors: 
Centre for Software Reliability, British Com¬ 
puter Society, IEE. Contact Neil Speirs, Com¬ 
puting Lab, Univ. of Newcastle upon Tyne, 
Newcastle upon Tyne, NE1 7RU, UK, phone 
44 (91) 232-8511. 

Compass 90, Fifth Conf. on Computer As¬ 
surance: Systems Integrity, Software 
Safety, and Process Security, June 26-29, 
Gaithersburg, Md. Cosponsors: IEEE Aero¬ 
space and Electronics Society, IEEE Nat’l 
Capital Area Council. Contact Dolores X. Wal¬ 
lace, Nat’l Inst, of Standards and Technology, 
Technology Bldg., B266, Gaithersburg, MD 
20899, phone (301) 975-3340. 


1990 ACM Conf. on Lisp and Functional 
Programming, June 27-29, Nice, France. 
Contact Gillies Kahn, INRIA Sophia — An- 
tipolis, 2004 Route des Lucioles, 06565 
Valbonne Cedex, France, phone (33) 93-65- 
78-01. 


Roundtable Discussion on Vision-Based 
Vehicle Guidance, July 2, Tokyo. Sponsor: 
Committee of IEEE Int’l Workshop on Intelli¬ 
gent Robots and Systems. Contact Ichiro 
Masaki, Computer Science Dept., GM Re¬ 
search Labs, 30500 Mound Rd., Warren, MI 
48090-9055, phone (313) 986-1466. 


DPDS 90, Second Int’l Symp. on Data- 
*5*7 bases in Parallel and Distributed Sys¬ 
tems, July 2-4, Dublin, Ireland. Cosponsor: 
ACM et al. Contact Rakesh Agrawal, AT&T 
Bell Labs, Rm. 3D450, 600 Mountain Ave., 
Murray Hill, NJ 07974, phone (201) 582-2250; 
Or David Bell, Inst, of Informatics, Univ. of Ul¬ 
ster, Jordanstown, County Antrim, Northern 
Ireland BT370QB, phone (0232) 365-131. 


SPAA 90, Second ACM Symp. on Par- 
*5*7 allel Algorithms and Architecture, 
July 2-6, Crete, Greece. Cosponsor: ACM. 
Contact Tom Leighton, MIT, Math Dept, and 
Computer Science Lab, Cambridge, MA 
02139, phone (617) 253-3662, e-mail ftl@ 
math.mit.edu. 


Conf. on Visual Information Assimilation in 
Man and Machine, June 27-29, Ann Arbor, 
Mich. Contact Univ. of Michigan Extension 
Service, Conference and Institutes Dept., 200 
Hill St., Ann Arbor, MI 48104-3297, phone 
(313) 764-5305. 


Fourth TC2 Working Conf. on Database 
Semantics, July 2-6, Windermere Lake Dis¬ 
trict, UK. Sponsors: IFIP, Coopers and 
Lybrand UK. Contact William Kent, Hewlett- 
Packard Labs, Dept. 3U, 1501 Page Mill Rd., 
Palo Alto, CA 94304-0971; or Robert Meers- 
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man, Infolab, Tilburg Univ., PO Box 90153, 
5000 LE Tilburg, The Netherlands. 


Second Int’l Conf. on Economics and 
Artificial Intelligence, July 4-6, Paris. 
Cosponsors: AFCET et al. Contact Assoc. 
Francaise pour la Cybemetique Economique et 
Technique, 156 Bd. Pereire, 75017 Paris, 
France, phone 33 (1) 4766-2419, fax 33 (1) 
4267-9312; J-L. Le Moigne, GRASCE, Univ. 
Aix Marseille III, 3 ave. Robert Schuman, 
13628, Aix en Provence, France; or P. 

Bourgine, 26 rue St. Louis, 78000, Versailles, 


•£3^, Fifth IEEE Structure in Complexity 
Theory Conf., July 7-11, Barcelona, 
Spain. Contact Stephen R. Mahaney, Com¬ 
puter Science Dept., Univ. of Arizona, Gould- 
Simpson 721, Tucson, AZ 86721, phone (602) 
621-2733. 


SPIE 1990 Int’l Symp. on Optical and Opto¬ 
electronic Applied Science and Engineer¬ 
ing, July 8-13, San Diego, Calif. Contact lnt’1 
Society for Optical Engineering, PO Box 10, 
Bellingham, WA 98227-0010, phone (206) 
676-3290, fax (206) 647-1445. 

Navy Micro 90 Conf., July 9-12, San Diego, 
Calif. Sponsor: Navy Regional Data Automa¬ 
tion Center. Contact Code 31.4, NARDAC San 
Diego, NAS North Island, Bldg. 1482, San Di¬ 
ego, CA 92135-5110, phone (619) 545-8645. 


WCCE 90, Fifth World Conf. on Com- 
'5*7 puters in Education, July 9-13, 

Sydney, Australia. Cosponsors: IFIP et al. 
Contact WCCE 90, PO Box 319, Darlinghurst, 
NSW 2010, Australia, phone (612) 211-5855. 


Iberamia 90, Second Ibero-American Conf. 
on Al, July 9-13, Morelia, Michoacan, Mex¬ 
ico. Sponsors: Centro Regional de Ensenanza 
en Informatica (Spain) et al. Contact Iberamia 
90, Atn. Srita. Ma. Antonieta Alvarez Perez, 
Apartado Postal 70302, C.P. 04510, Mexico, 
D.F. 


Contact Computer Science Dept., Univ. of 
Warwick, Coventry CV4 7AL, UK, phone 44 
(203) 523-194. 

1990 Summer Computer Simulation Conf., 
July 16-18, Calgary, Alta., Canada. Sponsor: 
Society for Computer Simulation. Contact 
SCS, PO Box 17900, San Diego, CA 92117- 
7900, phone (619) 277-3888. 

SIAM Annual Meeting, July 16-20, Chicago. 
Sponsor: Society for Industrial and Applied 
Mathematics. Contact SIAM, 3600 University 
City Science Center, Philadelphia, PA 19104- 
2688, phone (215) 382-9800, fax (215) 386- 
7999, e-mail siam@wharton.upenn.edu. 

Int’l Workshop on Semantics for Concur¬ 
rency, July 23-25, Leicester, UK. Sponsor: 
British Computer Society. Contact Marta 
Kwiatowska, Workshop on Semantics for 
Concurrency, Computing Studies Dept., Univ. 
of Leicester, Leicester LEI 7RH, UK, phone 
(44) 533-523603. 

Int’l Workshop on Principles of Diagnosis, 
July 23-25, Menlo Park, Calif. Cosponsors: 
American Assoc, for Artificial Intelligence, 
Price Waterhouse. Contact Walter Hamscher, 
Price Waterhouse Technology Center, 68 Wil¬ 
low Rd„ Menlo Park, CA 94025, phone (415) 
688-6669, e-mail hamscher@pw.com. 

10th Int’l Conf. in Computer Science, July 
23-27, Santiago, Chile. Contact Joachim von 
zur Gathen, Computer Science Dept., Univ. of 
Toronto, 10 King’s College Rd., Toronto, Can¬ 
ada M5S 1A4, phone (416) 978-6024, e-mail 
gathen@theory.toronto.edu. 

DIAC 90, Directions and Implications of 
Advanced Computing, July 28, Boston. 
Sponsor: Computer Professionals for Social 
Responsibility. Contact Douglas Schuler, 
Boeing Computer Services, MS 7L-64, PO 
24346, Seattle, WA 98124-0346, phone (206) 
634-2771. 


Int’l Neural Network Conf., July 9-13, Paris. 
Sponsors: IEEE Neural Networks Council, 

Int’l Neural Network Society. Contact Nina 
Thellier, NTC, 19 rue de la Tour, 75116 Paris 
Cedex, France, phone (33) 45-25-65-65, fax 
(33) 45-25-24-22. 

ICWES 9, Ninth Int’l Conf. of Women Engi¬ 
neers and Scientists, July 14-20, Warwick, 
UK. Sponsor: UK Women’s Engineering Soci¬ 
ety. Contact Conf. Services, ICWES 9, 55 New 
Cavendish St., London W1M 7RE, UK, phone 
(71) 486-0531, fax (71) 935-7759. 

Third Int’l Conf. on Industrial and 
'5*7 Engineering Applications for Al and 
Expert Systems, July 15-18, Charleston, S.C. 
Cosponsors: ACM et al. Contact Moonis Ali, 
Univ. of Tennessee Space Inst., MS15, B.H. 
Goethert Pkwy., Tullahoma, TN 37388-8897, 
phone (615) 455-0631, fax (615) 454-2354, e- 
mail alif@utsivl.bitnet. 

ICALP 90, 17th Int’l Colloquium on Auto¬ 
mata, Languages, and Programming, July 
16-20, Coventry, England. Sponsor: European 
Assoc, for Theoretical Computer Science. 


AAAI 90, Nat’l Conf. on Artificial Intelli¬ 
gence, July 29-Aug. 3, Boston. Sponsor: 
American Assoc, for Artificial Intelligence. 
Contact AAAI, 445 Burgess Dr., Menlo Park, 
CA 94025, phone (415) 328-3123, fax (415) 
321-4457; Edward Lafferty, Al Center, Mitre, 
MS A350, Burlington Rd., Bedford, MA 
01730, phone (617) 271-2773 (for workshop); 
or Marcel Schoppers, Advanced Decision Sys¬ 
tems, 1500 Plymouth St„ Mountain View, CA 
94043, phone (415) 960-7553, e-mail marcel@ 
ads.com (for control systems workshop). 


August 1990 


(ffij SIGGraph 90,17th Conf. on Com- 
^57 puter Graphics and Interactive Tech¬ 
niques, Aug. 6-10, Dallas. Sponsor: ACM. 
Contact Assoc, for Computing Machinery, 11 
W. 42nd St., New York, NY 10036, phone 
(212) 869-7440; or SIGGraph 90, 111 E. 
Wacker Dr., Suite 600, Chicago, IL 60601, fax 
(312) 938-1232. 

16th Int’l Conf. on Very Large Data 
'5*7 Bases, Aug. 13-16, Brisbane, Australia. 


Contact David Reiner, Lotus Development, 1 
Canal Park, Cambridge, MA 02141, phone 
(617) 577-8500, e-mail dreiner@lotus.com. 

ICPP 90, 19th Int’l Conf. on Parallel Pro¬ 
cessing, Aug. 13-17, St. Charles, Ill. Sponsor: 
Pennsylvania State Univ. Contact Benjamin 
W. Wah, Coordinated Science Lab., Univ. of 
Illinois, 1101 W. Springfield, Ave., Urbana, IL 
61801-2082, phone (217) 333-3516; or Tse- 
yun Feng, EE East Bldg., Pennsylvania State 
Univ., University Park, PA 16802, phone (814) 
863-1469. 

TAU 90,1990 Int’l Workshop on Timing Is¬ 
sues in the Specification and Synthesis of 
Digital Systems, Aug. 15-17, Vancouver, 
B.C., Canada. Sponsor: ACM. Contact Robert 
K. Brayton, Electrical Engineering and Com¬ 
puter Science Dept., Univ. of California at 
Berkeley, Berkeley, CA 94760. 

Int’l Symp. on Algorithms, Aug. 16-18, To¬ 
kyo. Sponsor: IPSJ Special Interest Group on 
Algorithms. Contact Tetsuo Asano, Osaka 
Electro-Communication Univ., Hatsu-cho, 
Neyagawa, Osaka 572, Japan, phone 81 (720) 
24-1131. 


UPADI 90, 21st Convention of the Pan 
American Federation of Engineering Socie¬ 
ties, Aug. 19-24, Washington, DC. Cospon¬ 
sors: American Assoc, of Engineering Socie¬ 
ties, American Society of Civil Engineers. 
Contact UPADI 90, ASCE, 345 E. 47th St., 

New York, NY 10017, phone (212) 705-7218. 

Hot Chips 2 Symp., Aug. 20-21, Santa 
'5*7 Clara, Calif. Contact Hasan S. AlKhatib, 
EECS Dept., Santa Clara Univ., Santa Clara, 
CA 95053, phone (408) 554-4485, fax (408) 
554-5475, e-mail halkhatib@scu.edu. 

Second Int’l Joint Conf. of ISSAC 90 (1990 
Int’l Symp. on Symbolic and Algebraic 
Computation) and AAECC 8 (Eighth Int’l 
Conf. on Applied Algebra, Algebraic Algo¬ 
rithms, and Error-Correcting Codes), Aug. 
20-24, Tokyo. Cosponsors: ACM et al. Contact 
Conf. Secretariat, IJC-2, Scientist, Inc., 
Yamazaki Bldg., 3-2 Kanda Suruga-dai, 
Chiyoda-ku, Tokyo 101, Japan. 


Coling 90, 13th Int’l Conf. on Computa¬ 
tional Linguistics, Aug. 20-25, Helsinki, Fin¬ 
land. Contact Hans Karlgren, KVAL, Skepps- 
bron 26, S-lll 30 Stockholm, Sweden. 


September 1990 


ISPRS Commission V Symp., Close- 
'5*7 Range Photogrammetry Meets Ma¬ 
chine Vision, Sept. 3-7, Zurich. Cosponsor: 
Int’l Society for Photogrammetry and Remote 
Sensing et al. Contact Armin Gruen, Inst, of 
Geodesy and Photogrammetry, ETH-Hoeng- 
gerberg, CH-8093, Zurich, Switzerland, 
phone 41 (1) 377-3051. 

EuroVHDL 90, First European Work- 
<5^ ing Conf. on VHDL Methods, Sept. 4- 

7, Marseille, France. Cosponsors: ACM et al. 
Contact Petra Michel, Siemens, A.G. Dept. 


June 1990 
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ZFE ISEA1, Otto Hahn Ring 6, Munich 83, 
West Germany. 

ASAP 90, Int’l Conf. on Application- 
Specific Array Processors, Sept. 5-7, 
Princeton, N.J. Cosponsor: Princeton Univ. 
Contact S.Y. Kung, Electrical Engineering 
Dept., Princeton Univ., Princeton, NJ 08544, 
phone (609) 258-3780. 

13th Int’l ACM/SIGIR Conf. on Research 
and Development in Information Retrieval, 
Sept. 5-7, Brussels. Contact Jean-Luc Vidick, 
Univ. Libre de Bruxelles, Avenue F.D. Roose¬ 
velt, Infodoc, CP 142, 1050 Brussels, Belgium. 

Int’l Workshop on VLSI for Artificial Intel¬ 
ligence and Neural Networks, Sept. 5-7, Ox¬ 
ford, England. Contact Jose G. Delgado-Frias, 
Electrical Engineering Dept., SUNY, Bing¬ 
hamton, NY 13901, phone (607) 777-4806, e- 
mail delgado@bingvaxu.cc.binghamton.edu. 


1990 Int’l Electronics Packaging Conf., 
Sept. 9-13, Marlborough, Mass. Sponsor: Int’l 
Electronics Packaging Society. Contact IEPS, 
114 N. Hale St., Wheaton, IL 60187, phone 
(708) 260-1044. 


Workshop on Computers in Systematic Bio¬ 
logy, Sept. 9-14, Davis, Calif. Sponsor: Nat’l 
Science Foundation. Contact Renaud For- 
tuner, California Dept, of Food and Agricul¬ 
ture, Analysis and Identification, Rm. 340, PO 
Box 942871, Sacramento, CA 94271-0001, 
phone (916) 445-4521. 


ITC 90, Int’l Test Conf., Sept. 10-12, 

Washington, DC. Cosponsor: IEEE 
Philadelphia Section. Contact Donald Den- 
burg, AT&T Bell Labs, 1247 S. Cedar Crest 
Blvd., Allentown, PA 18103; or ITC, 1201 
Sussex Turnpike, Suite 101, PO Box 264, Mt. 
Freedom, NJ 07970, phone (201) 895-5260. 


IEEE Conf. on Managing Expert Sys- 
tern Programs and Projects, Sept. 10- 

12, Washington, DC. Sponsor: IEEE Com¬ 
puter Society Technical Committee on Expert 
Systems. Contact Jay Liebowitz, Management 
Sciences Dept., George Washington Univ., 
Washington, DC, phone (202) 994-6969. 


Second Int’l Workshop on Advances in Ro¬ 
bot Kinematics, Sept. 10-12, Linz, Austria. 
Sponsors: Research Inst, for Symbolic Com¬ 
putation et al. Contact Sabine Stifler, RISC, 
Johannes Kepler Univ., A-4040 Linz, Austria, 
phone 43 (7236) 3231-50; or Jadran Lenarcic, 
Josef Stefan Inst., Univ. of Edvard Kardelj, 
Jamova 39, 61111 Ljubljana, Yugoslavia, 
phone 38 (61) 214-399. 


Symp. on Object-Oriented Programming 
Emphasizing Practical Applications, Sept. 
14-15, Poughkeepsie, N.Y. Sponsor: Marist 
College. Contact James TenEyck, Marist Col¬ 
lege, Poughkeepsie, NY 12601-1387, phone 
(914) 471-3240, e-mail jzbv@maristb.bitnet. 


ICCD 90, IEEE Int’l Conf. on Com- 
puter Design: VLSI in Computers and 
Processors, Sept. 16-19, Cambridge, Mass. 
Contact Edward M. Middlesworth, Hewlett- 
Packard, Bldg. 25U, PO Box 10350, Palo Alto, 
CA 94303-0867, phone (415) 857-5485; or 


ICCD 90, IEEE Computer Society, 1730 Mas¬ 
sachusetts Ave. NW, Washington, DC 20036- 
1903, phone (202) 371-1013. 

Fourth Digital Signal Processing Work¬ 
shop, Sept. 16-19, New Paltz, N.Y. Sponsor: 
IEEE Signal Processing Society. Contact K.S. 
Arun, Coordinated Science Lab, Univ. of Illi¬ 
nois at Urbana-Champaign, 1101 W. Spring- 
field Ave., Urbana, IL 61801, phone (217) 333- 
7678, fax (217) 244-1764. 

Internal Audit Advanced Technology Fo¬ 
rum, Sept. 17-19, Orlando, Fla. Sponsor: Inst, 
of Internal Auditors. Contact Stephen M. Par- 
oby, Ernst and Young, 787 Seventh Ave., New 
York, NY 10019, phone (212) 830-6000. 

ASIC 90, Third IEEE ASIC Seminar 
and Exhibit, Sept. 17-21, Rochester, 
N.Y. Cosponsors: IEEE Rochester Section, 
ACM. Contact Kenneth Hsu, Rochester Inst, of 
Technology, Computer Engineering Dept., 
Rochester, NY 14623, phone (716) 475-2655; 
or Lynne Engelbrecht, 170 Mt. Read Blvd., 
Rochester, NY 14611, phone (716) 328-2310, 
fax (716) 436-9370. 


,£1^1 EP 90, Electronic Publishing 90, Sept. 

18-20, Gaithersburg, Md. Sponsor: Nat’l 
Inst, of Standards and Technology. Contact 
Peter R. King, Computer Science Dept., Univ. 
of Manitoba, Winnipeg, Man., Canada R3T 
2N2, phone (204) 474-9935. 


ICARCV 90, Int’l Conf. on Automation, 
Robotics, and Computer Vision, Sept. 18- 

21, Singapore. Cosponsors: IEEE Singapore 
Chapter et al. Contact Dinesh P. Mital, 
ICARCV 90, School of Electrical and Elec¬ 
tronic Engineering, Nanyang Technological 
Inst., Nanyang Ave., Singapore 2263, Repub¬ 
lic of Singapore, phone (65) 660-5399. 


Conf. on Multiuser Interfaces and Applica¬ 
tions, Sept. 24-26, Heraklion, Crete, Greece. 
Cosponsors: IFIP et al. Contact Rena Kalait- 
zaki, Computer Science Dept., Univ. of Crete, 
GR 714-09 Heraklion, Crete, Greece, phone 30 
(81) 210-057. 


Int’l Workshop on Expert Systems in Engi¬ 
neering, Sept. 24-26, Vienna, Austria. Spon¬ 
sor: Christian Doppler Expert Systems Lab, 
Univ. of Vienna. Contact Wolfgang Nejdl, 
Technical Univ. of Vienna, Applied Computer 
Science Dept., CD Lab for Expert Systems, 
Paniglgasse 16, 1040 Vienna, Austria, fax 43 
(222) 505-5304, e-mail nejdl@vexpert.at. 

Tencon 90, IEEE Region 10 Conf. on Com¬ 
puter and Communication Systems, Sept. 
24-27, Hong Kong. Cosponsor: IEEE Hong 
Kong Section. Contact Y.S. Cheung, Electrical 
and Electronic Engineering Dept., Univ. of 
Hong Kong, Pokfulam, Hong Kong. 

SIGComm 90, Sept. 24-27, Philadelphia. 
Sponsor: ACM SIGComm. Contact David Far- 
ber, Univ. of Pennsylvania, 200 S. 33rd St., 
Philadelphia, PA 19104-6389, phone (215) 
898-9508, fax (215) 898-0587, e-mail 
farber@cis.upenn.edu; or Phil Kam, Bell 
Communications Research, MS 2P-357, 445 
South St., PO Box 1910, Morristown, NJ 
07962-1910, phone (201) 829-4299. 


AIRIES 90, Al Research in the Environ¬ 
mental Sciences Workshop, Sept. 25-27, 

Montreal, Que., Canada. Cosponsors: Univ. of 
Quebec at Montreal, Centre Researche Infor- 
matique de Montreal. Contact Rosemary M. 
Dyer, GL/LYP, AIRIES 90, Air Force Geo¬ 
physics Lab, Hanscom Air Force Base, MA 
01731, fax (617) 377-4498. 


Fourth Conf. on Putting Methods and Tools 
into Practice as Aids to Design Information 
Systems, Sept. 25-27, Nantes, France. Spon¬ 
sor: Univ. de Nantes, Inst. Univ. de Technolo- 
gie, Lab. d’lnformatique, Liana. Contact H. 
Habrias, 3 Rue du Marechal Joffre, 44041 Nan¬ 
tes Cedex 01, France, phone (33) 4030-6090, 
fax (33) 4030-6001. 


Cl 90, 1990 Int’l Symp. on Computa- 
'^*5' tional Intelligence, Sept. 27-29, Mi¬ 
lano, Italy. Sponsors: ACM, F.I.S. Cassa di 
Rosp. o. PC. Contact Giorgio Valle,.Universita 
Milano. Dip. Scienze Della Informazione, Via 
Moretto 20133, Milano, Italy, phone 39 (2) 
757-5228, fax 39 (2) 761-10556, e-mail 
valle@imiucca.bitnet. 


October 1990 


Third UNB Artificial Intelligence Work¬ 
shop, Oct. 1, Fredericton, N.B., Canada. Spon¬ 
sor: Univ. of New Brunswick. Contact B.G. 
Nickerson, School of Computer Science, Univ. 
of New Brunswick, PO Box 4400, Fredericton, 
N.B., Canada E3B 5A3, phone (506) 453-4566, 
fax (506) 453-3566, e-mail bgn@unb.ca. 

15th Conf. on Local Computer 
vAy Networking, Oct. 1-3, Minneapolis, 
Minn. Contact Marc Cohn, Advanced Devel¬ 
opment Div., Raychem Corp., 300 Constitu¬ 
tion Dr„ Menlo Park, CA 94025-1164, phone 
(415) 361-3902, fax (415) 361-6099. 


Second Int’l Conf. on Algebraic and Logic 
Programming, Oct. 1-3, Nancy, France. Con¬ 
tact Wolfgang Wechler, TU Braunschweig, 
Theoretische Informatik, Postfach 3329, D- 
3300 Braunschweig, West Germany, e-mail 
wechler@infbs.uucp; or Helene Kirchner, 
CRIN, BP239, 54506 Vandoeuvre-les-Nancy 
Cedex, France. 


Infojapan 90, Int’l Conf. on Informa¬ 
nt^ tion Technology, Oct. 1-5, Tokyo. 
Sponsor: IPSJ. Contact InfoJapan 90 Secretar¬ 
iat, c/o Simul Int’l, Kowa Bldg. No. 9, 1-8-10, 
Akasaka, Minato-ku, Tokyo 107, Japan, phone 
81 (3) 586-8691, fax 81 (3) 583-8336. 


Sixth Int’l Conf. on the Application of 
'y?' Standards for Open Systems Intercon¬ 
nection, Oct. 2-4, Gaithersburg, Md. Cospon¬ 
sor: Nat’I Inst, of Standards and Technology. 
Contact Brenda Gray, NIST/OSI, Rm. B217, 
Bldg. 225, Gaithersburg, MD 20899, phone 
(301) 975-3664. 


28th Allerton Conf. on Communication, 
Control, and Computing, Oct. 3-5, Mon- 
ticello, III. Contact Allerton Conf., c/o Donna 
J. Brown, Univ. of Illinois at Urbana-Cham¬ 
paign, Coordinated Science Lab, 1101 W. 
Springfield, Ave., Urbana, IL 61801, phone 


116 


COMPUTER 







(217) 244-0581, e-mail djb@uicsl.csl.uiuc. 
edu. 

1990 IEEE Workshop on Visual Lan- 
'5“7 guages, Oct. 4-6, Skokie, Ill. Cospon¬ 
sors: Univ. of Pittsburgh et al. Contact S.K. 
Chang, Computer Science Dept., Univ. of 
Pittsburgh, Pittsburgh, PA 15260. 

Frontiers 90, Third Symp. on Fron- 
' 5*7 tiers of Massively Parallel Computa¬ 
tion, Oct. 8-10, College Park, Md. Cospon¬ 
sors: Nat’l IEEE Capital Area Chapter, NASA 
Goddard Space Flight Center. Contact 
Johanna Weinstein, Frontiers 90, UMIACS, 
Univ. of Maryland, A.V. Williams Bldg., Col¬ 
lege Park, MD 20742, phone (301) 454-1808. 

Future Trends 90, Workshop on Fu- 
' 5*7 ture Trends of Distributed Computing 
Systems, Oct. 8-10, Cairo. Contact Stephen S. 
Yau, Univ. of Florida, CIS Dept., Rm. 301, 
Gainesville, FL 32611, phone (904) 335-8006. 

Ninth Symp. on Reliable Distributed 
5*7 Systems, Oct. 9-11, Huntsville, Ala. 
Contact Raif M. Yanney, TRW, MS DH2/ 

2328, 1 Space Park, Redondo Beach, CA 
90278, phone (213) 764-6033. 

Northcon 90, Oct. 9-11, Seattle. Cosponsors: 
IEEE et al. Contact Northcon 90 Professional 
Program Committee, c/o Ramona Baker, 8110 
Airport Blvd., Los Angeles, CA 90045-3194, 
phone (213) 215-3796, ext. 222. 

PDCS 90, ISMM Int’l Conf. on Parallel and 
Distributed Computing and Systems, Oct. 
10-12, New Xork City. Sponsor: Int’l Society 
for Mini and Microcomputers. Contact R. 
Ammar, U155, Computer Science and Engi¬ 
neering Dept., Univ. of Connecticut, Storrs, 

CT 06268, fax (203) 486-0318. 

EuroForum 90, Fourth European EDIF Fo¬ 
rum, Oct. 11-12, Daresbury, Cheshire, UK. 
Contact Kate Faulkner, EuroForum 90, ICL, 
Manchester M12 5DR, UK phone 44 (61) 223- 
1301, fax 44 (61) 223-1207. 

Second Int’l Conf. on Microelectronics, Oct. 
13-15, Damascus, Syria. Sponsor: Arab 
School of Science and Technology. Contact 
M.I. Elmasry, VLSI Research Group, Univ. of 
Waterloo, Waterloo, Ont., Canada N2L 3G1, 
phone (519) 885-1211, ext. 3753. 

1990 Fall VHDL Users’ Group Meeting, 

Oct. 14-17, Oakland, Calif. Contact David 
Barton, Intermetrics, 4733 Bethesda Ave., Be- 
thesda, MD 20814, phone (301) 657-3775; or 
Doug Perry, Synopsis, 1098 Alta Ave., Moun¬ 
tain View, CA 94943, phone (415) 962-5000. 

AIPR 19, Workshop on Applied Imagery 
Pattern Recognition, Oct. 17-19, McLean, 
Va. Sponsors: Society of Photooptical Instru¬ 
mentation Engineers, Rome Air Development 
Center. Contact Brian Mitchell, ERIM, PO 
Box 8618, Ann Arbor, MI 48106. 

Third Fall VHDL Users’ Group Meet- 
557 ing, Oct. 21-24, Redondo Beach, Calif. 
Contact Rachel Rusting, Intermetrics, 733 
Concord Ave., Cambridge, MA 02138, phone 
(617) 661-1840. 


12th Saudi Nat’l Computer Conf. on Plan¬ 
ning for the Informatics Society, Oct. 21-24, 

Riyadh, Saudi Arabia. Cosponsors: King Saud 
Univ., Saudi Computer Society. Contact Mo¬ 
hammad M. Mandurah, College of Computer 
and Information Sciences, PO Box 51178, Ri¬ 
yadh, 11543, Kingdom of Saudi Arabia, phone 
996 (1) 467-6993. 

OOPSLA 90, Fifth Conf. on Object-Ori¬ 
ented Programming Systems, Languages, 
and Applications, Oct. 21-25, Ottawa, Can¬ 
ada. Sponsor: ACM. Contact Assoc, for Com¬ 
puting Machinery, 11 W. 42nd St., New York, 
NY 10036, phone (212) 869-7440. 

FOCS, 31st Foundations of Computer 
' 5*7 Science, Oct. 22-24, St. Louis, Mo. Con¬ 
tact Christos Papdimitriou, Computer Science 
Dept., Univ. of California at San Diego, La 
Jolla, CA 92093, phone (619) 534-2086. 

Int’l Conf. on Computer Applications in De¬ 
veloping Countries, Oct. 22-24, Benin City, 
Nigeria. Sponsor: Large Scale Systems Re¬ 
search Group, Univ. of Benin. Contact E.A. 
Onibere, Mathematics and Computer Science 
Dept., Univ. of Benin, P.M.B. 1154, Benin 
City, Nigeria. 

Ninth National Conf. on EDP System and 
Software Quality Assurance, Oct. 22-24, 

Washington, DC. Sponsor: Data Processing 
Management Assoc. Contact US Professional 
Development Inst., EDP System and Software 
Quality Assurance, 1734 Elton Rd., Suite 221, 
Silver Spring, MD 20903-1733, phone (301) 
445-4400, fax (301) 445-5722. 

^3^, JCIT 5, Fifth Jerusalem Conf. on In- 
' 5*7 formation Technology, Oct. 22-25, 

Jerusalem, Israel. Sponsor: Information Pro¬ 
cessing Assoc, of Israel. Contact Abraham 
Peled, IBM T.J. Watson Research Center, PO 
Box 704, Yorktown Heights, NY 10598. 

CC 90, Third Int’l Workshop on Compiler 
Compilers, Oct. 22-26, Schwerin, East Ger¬ 
many. Sponsors: German Democratic Repub¬ 
lic Academy of Sciences Inst, of Informatics 
and Computing Technique et al. Contact Mi¬ 
chael Albinus, CC 90 Organizing Committee, 
Akademie der Wissenschaften der DDR, Inst, 
fur Informatik und Rechentechnik, Rudower 
Chaussee 5, Berlin, GDR — 1199. 

Third Int’l Symp. on Artificial Intelligence, 
Oct. 22-26, Monterrey, N.L. Mexico. Spon¬ 
sors: ITESM (Inst. Tecnologico y de Estudios 
Superiores de Monterrey) et al. Contact Hugo 
Terashima, Centro de Inteligencia Artificial, 
ITESM, Sue. de Correos “J”, C.P. 64849 Mon¬ 
terrey, N.L. Mexico, phone 52 (83) 58-2000, 
fax 52 (83) 58-0771, e-mail isai@tecmtyvm. 
bitnet. 

Visualization 90, Oct. 23-26, San Fran- 
V57 cisco. Contact Bruce Brown, Oracle 
Corp., 20 Davis Dr., Belmont, CA 94002, 
phone (415) 598-3628. 

ESORICS 90, European Symp. on Research 
in Computer Security, Oct. 24-26, Toulouse, 
France. Sponsor: AFCET. Contact Martin 
Gilles, 16 Para de Diane, 78350 Jouy eu Josas, 
Toulouse Cedex, France. 


First Japanese Knowledge Acquisition for 
Knowledge-Based Systems Workshop, Oct. 
25-26, Kyoto, Japan, and Oct. 29-31, Tokyo. 
Cosponsors: Kansai Inst, of Information Sys¬ 
tems et al. Contact John H. Boose, Advanced 
Technology Center, Boeing Computer Ser¬ 
vices 7L-64, PO Box 24346, Seattle, WA 
98124, phone (206) 865-3253. 

NACLP 90, 1990 North American 
' 5*7 Conf. on Logic Programming, Oct. 28- 
Nov. 1, Austin, Texas. Cosponsor: ACM. Con¬ 
tact Carlo Zaniolo, MCC, 3500 W. Balcones 
Center Dr., Austin, TX 78759, phone (512) 
338-3442. 

Int’l Conf. on Information Technology, Oct. 
29-31, Bournemouth, UK. Sponsor: Institu¬ 
tion of Electrical Engineers. Contact Conf. 
Services, IEE, Savoy Place, London WC2R 
0BL, UK, phone 44 (71) 240-1871, fax 44 (71) 
240-7735. 

Eighth Pacific Northwest Software Quality 
Conf., Oct. 29-31, Portland, Ore. Sponsor: 
PNSQC Committee. Contact Terri Moore, Pa¬ 
cific Agenda, PO Box 10142, Portland, OR 
97210, phone (503) 223-8633. 

ISCIA 5, Fifth Int’l Symp. on Computer and 
Information Sciences, Oct. 30-Nov. 2, Cap¬ 
padocia, Nevsehir, Turkey. Sponsors: Istanbul 
Technical Univ. et al. Contact A. Emre Har- 
manci, Istanbul Technical Univ., Bilgi Islem 
Merkezi, Ayazaga, 80626 Istanbul, Turkey, 
phone 090 (1) 176-3254, fax 090 (1) 176-1734, 
e-mail harmanci@tritu.bitnet. 

Compsac 90, 14th Int'l Computer 
^* 7 ' Software and Applications Conf., Oct. 
31-Nov. 2, Chicago. Contact Ifay F. Chang, 
Rm. 1B28, IBM T.J. Watson Research Center, 
PO Box 714, Yorktown Heights, NY 10595, 
phone (914) 789-7825. 


November 1990 


14th SCAMC, 1990 Symp. on Computer Ap¬ 
plications in Medical Care, Nov. 4-7, Wash¬ 
ington, DC. Cosponsors: George Washington 
Univ. Medical Center et al. Contact SCAMC — 
Office of CEM, George Washington Univ. 
Medical Center, 2300 K St. NW, Washington, 
DC 20037, phone (202) 994-8928. 

24th Asilomar Conf. on Signals, Systems, 
and Computers, Nov. 5-7, Pacific Grove, 
Calif. Sponsors: Naval Postgraduate School et 
al. Contact George M. Dillard, Naval Ocean 
Systems Center, San Diego, CA 92152-5000, 
phone (619) 553-2478. 

1990 IFIP-IEEE Int’l Workshop on 
' 5*7 Defect and Fault Tolerance in VLSI 
Systems, Nov. 5-7, Grenoble, France. Contact 
Gabriel^ Saucier, Inst. National Polytechnique 
de Grenoble/CSI, 46 avenue Felix-Viallet, 
38031 Grenoble Cedex, France, phone (33) 76- 
57-46-87, fax (33) 76-50-23-21; or Tulin E. 
Mangir, TRW, 1 Space Park, R2/2036, Re¬ 
dondo Beach, CA 90278, phone (213) 813- 
3894, fax (213) 813-3709. 
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ICCS 90, Int’l Conf. on Communication Sys¬ 
tems, Nov. 5-9, Singapore. Cosponsors: IEEE 
Singapore Section et al. Contact ICCS 90, c/o 
Meeting Planners Pte. Ltd., 100 Beach Rd. 
#33-01, Shaw Towers, Singapore 0718. 

Second SIAM Conf. on Linear Algebra in 
Signals, Systems, and Control, Nov. 5-9, San 
Francisco, Calif. Sponsor: Society for Indus¬ 
trial and Applied Mathematics. Contact SIAM, 
3600 University City Science Center, Phila¬ 
delphia, PA 19104-2688, phone (215) 382- 
9800, fax (215) 386-7999, e-mail siam@ 
wharton.upenn.edu. 

ICCC 90,10th Int’l Conf. on Computer 
Communication, Nov. 5-9, New Delhi, India. 
Sponsor: Int’l Council on Computer Commu¬ 
nication. Contact Saroj Chowla or P.P. Gupta, 
ICCC 90, CMC Ltd., A-5 Ring Rd., South Ex¬ 
tension Part I, New Delhi 110 049, India, phone 
91 (11) 626-807, fax 91 (11) 684-4652. 


Intelligent Robotic Systems: Design and 
Applications, Nov. 6-7, Philadelphia. Spon¬ 
sor: SPIE. Contact Mohan M. Trivedi, Univ. of 
Tennessee, Electrical and Computer Engineer¬ 
ing, Ferns Hall, Knoxville, TN 37996-2100, 
phone (615) 974-5450. 


TAI 90, Second Computer Society 
^*7 Int’l Conf. on Tools for Artificial Intel¬ 
ligence, Nov. 6-9, Washington, DC. Cospon¬ 
sors: Rutgers Univ. et al. Contact Nikolas G. 
Bourbakis, George Mason Univ., ECE Dept., 
Fairfax, VA 22030, phone (703) 425-3930. 


IEEE Workshop on the Management 
'51? of Replicated Data, Nov. 7-9, Houston. 
Sponsor: IEEE Technical Committee on Oper¬ 
ating Systems. Contact Jehan-Francois Paris, 
Computer Science Dept., Univ. of Houston, 
Houston, TX 77204-3475, phone (713) 749- 
3943, e-mail paris@cs.uh.edu; or Luis-Felipe 
Cabrera, IBM Almaden Research Center, 650 
Harry Rd., MC K55/803, San Jose, CA 95120- 
6099, phone (408) 927-1838. 


1990 IEEE Workshop on VLSI Signal Pro¬ 
cessing, Nov. 7-9, San Diego, Calif. Contact 
Patti Fenstermacher, AT&T Bell Labs, 1243 S. 
Cedar Crest Blvd., Allentown, PA 18103, e- 
mail psf@aloft.att.com; or Howard S. Mosco- 
vitz, AT&T Bell Labs, 1243 S. Cedar Crest 
Blvd., Allentown, PA 18103, e-mail mosc@ 
aloft.att.com. 


Int’l Workshop on Network and Operating 
System Support for Digital Audio and 
Video, Nov. 8-9, Berkeley, Calif. Sponsor: 
Int’l Computer Science Inst. Contact Ramesh 
Govindan, ICSI, 1947 Center St., Suite 600, 
Berkeley, CA 94704-1105, phone (415) 642- 
4274, ext. 136, e-mail av-workshop@ 
berkeley.edu. 

Fourth Southeastern Small-College Com¬ 
puting Conf., Nov. 9-10, Hickory, N.C. Spon¬ 
sor: Consortium for Computing in Small Col¬ 
leges. Contact Susan Dean, Samford Univ., 
800 Lakeshore Dr., Birmingham, AL 35229. 


£2^ ICCAD 90, IEEE Int’l Conf. on Com- 
'51? puter-Aided Design, Nov. 11-15, Santa 
Clara, Calif. Cosponsor: IEEE Circuits and 
Systems Society. Contact Pat Pistilli, MP As¬ 


sociates, 7490 Clubhouse Rd., Suite 102, Boul¬ 
der, CO 80301, phone (303) 530-4562 or 4333. 


Vision 90, Nov. 12-15, Detroit. Cosponsors: 
Society of Manufacturing Engineers and SME 
Machine Vision Assoc. Contact Lisa Macha- 
cki, Vision 90, SME Conf. Dept., PO Box 930, 
Dearborn, MI, phone (313) 271-1500, ext. 369. 


Supercomputing 90, Nov. 12-16, New 

York City. Cosponsor: ACM. Contact 
Joanne L. Martin, IBM T.J. Watson Research 
Center, PO Box 218, Route 134, Yorktown 
Heights, NY 10698, phone (914) 945-3285, e- 
mail jlmart@ibm.com; or Supercomputing 90, 
IEEE Computer Society, 1730 Massachusetts 
Ave. NW, Washington, DC 20036-1903, 
phone (202) 371-1013. 


Seventh Governor’s Symp. on High 
'5*7^ Technology, Nov. 13-15, Kauai, Hawaii. 
Sponsor: State of Hawaii. Contact William M. 
Ball, State of Hawaii, 300 Kahelu St., Suite 35, 
Mililani, HI 96789, phone (808) 625-5293. 


Fall Comdex, Nov. 13-17, Las Vegas. Contact 
Interface Group, 300 First Ave., Needham, 
MA 02194, phone (617) 449-6600. 


PRICAI 90, Pacific Rim Int’l Conf. on 
5*7 Artificial Intelligence 90, Nov. 14-16, 

Nagoya-shi, Aichi, Japan. Sponsor: Japanese 
Society for Artificial Intelligence et al. Contact 
Teruo Fukumura, Inter Group Corp., Akasaka 
Yamakatsu Bldg., 8-5-32 Akasaka, Minato- 
ku, Tokyo 107, Japan, phone (03) 479-5535. 


14th Western Educational Computing 
Conf., Nov. 15-16, Irvine, Calif. Sponsor: 
California Educational Computing Consor¬ 
tium. Contact Oliver Seely, Jr., California 
State Univ. at Dominguez Hills, Chemistry, 
1000 E. Victoria St., Carson, CA 90747. 


Center, Univ. of Colorado, Campus Box 425, 
Boulder, CO 80309-0425. 


Micro 23, 23rd Symp. and Workshop 
^§7 on Microprogramming and Micro¬ 
architecture, Nov. 27-29, Orlando, Fla. Co¬ 
sponsor: ACM. Contact Chris Papachristou, 
Case Western Reserve Univ., Computer Engi¬ 
neering and Science Dept., Cleveland, OH 
44106, phone (216) 368-5277, e-mail 
cap@alpha.ces.cwru.edu. 


16th Conf. of the IEEE Industrial Electron¬ 
ics Society, Nov. 27-30, Pacific Grove, Calif. 
Contact Robert Begun, 23609 Skyview Terr., 
Los Gatos, CA 95030, phone (408) 353-1560. 


IAPR Workshop on Machine Vision Appli¬ 
cations, Nov. 28-30, Tokyo. Sponsor: Int’l 
Assoc, for Pattern Recognition. Contact Mikio 
Takagi, Inst, of Industrial Science, Univ. of 
Tokyo, 7-22-1 Roppongi, Minatoku, Tokyo 
106, Japan, phone 81 (3) 479-0289, fax 81 (3) 
423-2834, e-mail takagi@tkl.iis.u-tokyo.ac.jp 


December 1990 


First Int’l Symp. on Uncertainty and 
'5*7 Analysis: Fuzzy Reasoning, Probabil¬ 
istic Methods, and Risk Management, Dec. 
3-5, College Park, Md. Sponsors: Univ. of 
Maryland et al. Contact Bilal M. Ayyub, Civil 
Engineering Dept., Univ. of Maryland, Col¬ 
lege Park. MD 20742. 

ACM SIGSoft 90, Fourth Symp. on Software 
Development Environments, Dec. 3-5, Ir¬ 
vine, Calif. Sponsor: ACM. Contact Dewayne 
E. Perry, AT&T Bell Labs, 600 Mountain Ave., 
Murray Hill, NJ 07974, phone (201) 582-2529. 


AIDA 90, Sixth Conf. on Artificial Intelli¬ 
gence and Ada, Nov. 15-16, Reston, Va. Spon¬ 
sors: George Mason Univ. et al. Contact AIDA 
90, Computer Science Dept., George Mason 
Univ., 4400 University Dr., Fairfax, VA 
22030, phone (703) 323-2713, fax (703) 323- 
2630, e-mail aida@gmuvax.gmu.edu. 


Sixth Computer Security Applications 
Conf., Dec. 3-7, Tucson, Ariz. Sponsors: 
American Society for Industrial Security et al. 
Contact Marshall D. Abrams, Mitre Corp., 
7525 Colshire Dr„ M/S Z269, McLean, VA 
22101, phone (703) 883-6938, e-mail 
abrams@mitre.org. 


Cognitiva 90, Nov. 20-23, Madrid. 

’5!?' Sponsor: AFCET. Contact Cognitiva 90, 
c/o Assoc. Francaise pour la Cybemetique 
Economique et Technique, 156 Bd. Pereire, 
75017 Paris, France, phone 33 (1) 4766-2419, 
fax 33 (1) 4267-9312. 


Tri-Ada 90, Dec. 3-7, Baltimore, Md. Spon¬ 
sor: ACM. Contact Erhard Ploedereder, Tartan 
Labs, 300 Oxford Dr., Monroeville, PA 15146, 
phone (412) 856-3600, fax (412) 856-3636, e- 
mail ploedere@tartan.com or ploedere@ 
ajpo.sei.cmu.edu 


Al 90, Australian Joint Artificial Intelli¬ 
gence Conf., Nov. 21-23, Perth, Western Aus¬ 
tralia. Sponsor: Australian Computer Society. 
Contact Les Kitchen, Univ. of Western Austra¬ 
lia, Computer Science Dept., Nedlands, West¬ 
ern Australia, 6009, phone 61 (9) 380-2281, e- 
mail ai90@wacsvax.oz.au. 

IEEE 1990 Conf. on Software Mainte- 
'5*7 nance, Nov. 26-29, San Diego, Calif. 
Contact Thomas M. Pigoski, USN, NSGD 
Pensacola, Corry Station, Pensacola, FL 
32511, phone (904) 452-6399. 

NIPS 90, IEEE Conf. on Neural Information 
Processing Systems, Nov. 26-29, Denver, 
Colo. Contact Kathie Hibbard, Engineering 


ICCV 90, Third Int’l Conf. on Com¬ 
ply puter Vision, Dec. 4-7, Osaka, Japan. 
Contact ICCV 90, IEEE Computer Society, 
1730 Massachusetts Ave. NW, Washington, 
DC 20036-1903, phone (202) 371-1013. 

SEARCC 90, South East Asia Regional 
Computer Confederation Conf., Dec. 4-8, 
Manila. Sponsor: Philippine Computer Soci¬ 
ety. Contact Victor B. Gruet, Computer Infor¬ 
mation Systems, CIS Bldg., Meralco Com¬ 
pound, Ortigas Ave., 1602 Pasig, Metro Ma¬ 
nila, Philippines, phone (632) 722-1251, fax 
(632) 722-0141. 

^3^ 11th Real-Time Systems Symp., Dec. 
'5*7 5-7, Orlando, Fla. Sponsor: IEEE Com- 


18 


COMPUTER 







puter Society Technical Committee on Real- 
Time Computing. Contact Doug Locke, IBM 
— MS 409, Systems Integration Div., 6600 
Rockledge Dr., Bethesda, MD 20817, phone 
(301) 493-1496, e-mail cdl@cs.cmu.edu. 


CASE 90, Fourth Int’l Workshop on 
Computer-Aided Software Engineer¬ 
ing, Dec. 5-8, Irvine, Calif. Contact Elliott J. 
Chikofsky, Radius Systems, 75 Lexington St., 
Burlington, MA01803, phone (617)494- 
8200. 


San Diego Workshop on Volume Visu- 
'5*7 alization, Dec. 10-12, La Jolla, Calif. 
Cosponsor: ACM. Contact T. Todd Elvins, 
SDSC, Box 85608, San Diego, CA 92038, 
phone (619) 534-5128. 

^2^, Second IEEE Symp. on Parallel and 
Distributed Processing, Dec. 10-12, 

Dallas. Cosponsor: Dallas Chapter of the IEEE 
Computer Society. Contact Behrooz Shirazi, 
Computer Science Dept., Southern Methodist 
Univ., 6425 Airline Rd., Dallas, TX 75205- 
2337, phone (214) 692-2874, e-mail shirazi% 


ICDT 90, Third IntT Conf. on Database The¬ 
ory, Dec. 11-15, Paris. Sponsor: INRIA. Con¬ 
tact INRIA, Domaine de Voluceau — Roc- 
quencourt, BP 105, 78153 Le Chesnay Cedex, 
France, phone 33 (1) 3963-5500. 

10th Conf. on Foundations of Software 
Technology and Theoretical Computer Sci¬ 
ence, Dec. 17-19, Bangalore, India. Contact 
Y.N. Srikant, Indian Inst, of Science, Banga¬ 
lore 560 012, India, phone (812) 334-411. 

1990 IEEE Workshop on Languages 
'5*^ and Architectures for Automation, 
Dec. 19-21, Honolulu, Hawaii. Sponsors: Pa¬ 
cific IntT Center for High Technology Re¬ 
search et al. Contact D.Y.Y. Yun, Univ. of Ha¬ 
waii, 711 Kapiolani Blvd., Suite 200, Hono¬ 
lulu, HI 96813-5249, phone (808) 539-1532, 
fax (808) 941-1399; or Shi-Kuo Chang, 322 
Alumni Hall, Univ. of Pittsburgh, Pittsburgh, 
PA 15260, phone (412) 624-8493, fax (412) 
624-8465, e-mail chang@vax.cs.pitt.edu. 


January 1991 


Fourth CSI/IEEE IntT Symp. on VLSI 
Design, Jan. 5-8, New Delhi. Sponsors: 
Computer Society of India et al. Contact Yash- 
want K. Malaiya, Computer Science Dept., 
Colorado State Univ., Fort Collins, CO 80523, 
phone (303) 491-7031, fax (303) 491-2293, e- 
mail malaiya@ravi.cs.colostate.edu; or D. 
Roy Chowdhury, Gateway Design Automa¬ 
tion, SDF#A-1, Noida Export Processing 
Zone, PO NEPZ, Noida 201305, India, phone 
91 (05736) 62342, fax 91 (05736) 62343. 

IntT Conf. on Multimedia Informa- 
tion Systems, Jan. 16-18, Singapore. 
Contact Juzar Motiwalla, Inst, of Systems Sci¬ 
ence, Nat’1 Univ. of Singapore, Heng Mui 
Keng Terr., Kent Ridge, Singapore 0511, 
phone (65) 772-2075. 


i£S^j PADS, Workshop on Parallel and Dis- 
tributed Simulation, Jan. 21-23, 

Anaheim, Calif. Cosponsors: ACM, SCS. Con¬ 
tact David M. Nicol, Computer Science Dept., 
College of William and Mary, Williamsburg, 
VA 23185, phone (804) 221-3458, e-mail 
nicol@cs.wm.edu. 

IEEE IntT Conf. on Wafer Scale Inte- 
gration, Jan. 29-31, San Francisco, 
Calif. Cosponsors: IEEE Components, Hy¬ 
brids, and Manufacturing Technology Soci¬ 
ety. Contact Terry Chappell, 730 Encino Dr., 
Aptos, CA 95003, phone (408) 662-1936; or 
R. Mike Lea, Brunei Univ., Uxbridge UB8 
3PH, UK, phone (44) 895-74000, ext. 2821, 
fax (44) 895-58728, e-mail mike.lea@ 
brunel.ac.uk. 


February 1991 


Fifth IntT Conf. on Modeling Techniques 
and Tools for Computer Performance 
Evaluation, Feb. 13-15, Torino, Italy. Contact 
Maria Carla Calzarossa, Dip. di Informatica e 
Sistemistica, Univ. di Pavia, Via Abbiate- 
grasso, 209, 27100 Pavia, Italy, phone 39 (382) 
391-350, fax 39 (382) 422-881, e-mail 
mcc@ ipvpel.infn.it. 


CAIA 91, Seventh IEEE Conf. on Arti- 
ficial Intelligence Applications, Feb. 
24-28, Miami Beach, Fla. Contact IEEE Com¬ 
puter Society, 1730 Massachusetts Ave. NW, 
Washington, DC 20036-1903, phone (202) 
371-1013. 


Fourth Topical Meeting on Robotics and 
Remote Systems for Hazardous Environ¬ 
ments, Feb. 24-28, Albuquerque, N.M. Con¬ 
tact Raymond W. Harrigan, Div. 1414, Sandia 
Nat’l Labs, Albuquerque, NM 87185, phone 
(505) 846-6278, fax (505) 846-7425. 

EDAC 91, European Design Automation 
Conf., Feb. 25-28, Amsterdam. Cosponsor: 
IEEE. Contact Secretariat, EDAC 91, CEP 
Consultants, 26-28 Albany St., Edinburgh 
EH1 3QH, Scotland, phone 44 (31) 557-2478, 
fax 44 (31) 557-5749. 


Compcon Spring 91, Feb. 25-Mar. 1, 

Kgy San Francisco. Contact Compcon Spring 
91, IEEE Computer Society, 1730 Massachu¬ 
setts Ave. NW, Washington, DC 20036-1903, 
phone (202) 371-1013. 


March 1991 


Fifth IntT Workshop on High-Level 
'5*7' Synthesis, Mar. 3-6, Buhlerhohe, West 
Germany. Cosponsors: IEEE et al. Contact 
Raul Camposano, IBM T.J. Watson Research 
Center, PO Box 218, Yorktown Heights, NY 
10598, phone (914) 945-3871, e-mail 
raulc@ibm.com. 


Third IEE Conf. on Telecommunications, 
Mar. 17-20, Edinburgh, Scotland. Sponsor: 
Inst, of Electrical Engineers. Contact Conf. 


Services, IEE, Savoy Place, London WC2R 
0BL, UK, phone 44 (71) 240-1871, fax 44 (71) 
240-7735. 

Fifth SIAM Conf. on Parallel Processing 
and Scientific Computing, Mar. 22-24, 

Houston. Sponsor: Society for Industrial and 
Applied Mathematics. Contact SIAM, 3600 
University City Science Center, Philadelphia, 
PA 19104-2688, phone (215) 382-9800, fax 
(215) 386-7999, e-mail siam@wharton. 
upenn.edu. 

Advanced Research in VLSI Conf., Mar. 25- 

27, Santa Cruz, Calif. Contact Carlo H. Sequin, 
Univ. of California, CS Div., 529B Evans Hall, 
Berkeley, CA 94720. 

CEEDA 91, IntT Conf. on Concurrent Engi¬ 
neering and Electronic Design Automation, 
Mar. 26-28, Bournemouth, Dorset, UK. Con¬ 
tact S. Medhat, Dorset Inst., Electronic and 
Engineering Dept., Wallisdown Road, Wallis- 
down Poole BH125BB, UK, phone (0202) 595- 
494. 

Fifth Parallel Processing Symp., Mar. 27- 

29, Newport Beach, Calif. Contact Larry H. 
Canter, Computer Systems Approach, 1140 S. 
Raymond Ave., Suite B, Fullerton, CA 92631. 

10th IEEE IntT Phoenix Conf. on Comput¬ 
ers and Communications, Mar. 27-30, 

Scottsdale, Ariz. Sponsors: IEEE, IEEE Com¬ 
munications Society. Contact Oris Friesen, 
Bull HN, PO Box 8000, MS A93, Phoenix, AZ 
85066, phone (602) 862-5200, e-mail 
friesen@system-m.phx.bull.com. 


April 1991 


Second IntT Symp. on Database Sys- 
'5*7^ terns for Advanced Applications, Apr. 
2-4, Tokyo. Sponsor: IPSJ. Contact Yahiko 
Kambayashi, Computer Science Dept., 
Kyushu Univ., 6-10-1 Hakozaki, Higashi 
Fukuoka 812, Japan, phone 81 (92) 641-1101, 
ext. 5407; or Yoshifumi Masunaga, Univ. of 
Library and Information Science, 1-2 Kasuga, 
Tsukuba, Ibaraki 305, Japan, phone 81 (298) 
52-0511, ext. 340, fax 81 (298) 52-4326, e- 
mail masunaga@ulis.ac.jp. 

Third Symp. on Integrated Ferroelectrics, 
Apr. 3-5, Colorado Springs, Colo. Contact 
Conf. Secretary, Microelectronics Research 
Lab, Univ. of Colorado at Colorado Springs, 
PO Box 7150, Colorado Springs, CO 80933- 
7150, phone (719) 593-3488, fax (719) 594- 
4257. 


1991 IEEE IntT Conf. on Robotics and 
Automation, Apr. 7-12, Sacramento, Calif. 
Sponsor: IEEE Robotics and Automation Soci¬ 
ety. Contact Robotics and Automation, PO 
Box 3216, Silver Spring, MD 20918, phone 
(301) 434-1990. 

/£S^| First IEEE IntT Workshop on Inter- 
'5*7^ operability in Multidatabase Systems, 
Apr. 8-9, Kyoto, Japan. Contact Ahmed K. El- 
magarmid, Purdue Univ., Computer Sciences 
Dept., West Lafayette, IN 47907, phone (317) 
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494-1998; or Yutaka Matsushita, Instrumenta¬ 
tion Dept., Keio Univ., Hiyoshi, Yokohama, 
Japan, phone 81 (44) 63-1141, ext. 3564. 

Seventh Int’l Conf. on Data Engineer- 
'5*7 ing, Apr. 8-12, Kobe, Japan. Contact 
Ming T. (Mike) Liu, Computer and Informa¬ 
tion Science Dept., Ohio State Univ., 2036 Neil 
Ave., Columbus, OH 43210-1277, phone 
(614) 292-1837, e-mail liu@cis.ircc.ohio- 
state.edu; or Data Engineering 91, IEEE Com¬ 
puter Society, 1730 Massachusetts Ave. NW, 
Washington, DC 20036-1903, phone (202) 
371-1013, fax (202) 728-0884. 

IFIP Working Conf. on Modeling in Com¬ 
puter Graphics, Apr. 8-12, Tokyo. Sponsor: 
IFIP TC 5/WG 5.10. Contact Tosiyasu L. 

Kunii, Information Science Dept., Faculty of 
Tokyo, Univ. of Tokyo, 7-3-1 Hongo, Bunkyo- 
ku, Tokyo 113, Japan, phone 81 (3) 816-1783, 
fax 81 (3) 818-4607, e-mail b39756@ 
tansei.cc.u-tokyo.ac.jp. 

RTA 91, Fourth Int’l Conf. on Rewriting 
Techniques and Applications, Apr. 10-12, 

Como, Italy. Sponsors: State Univ. of Milan. 
Contact G. Degli Antoni or Marelva Bianchi, 
Dip. di Scienze Dell’ Informazione, Univ. di 
Milano, Via Milano Moretto da Brescia 9,1- 
20133 Milano, Italia, phone 39 (02) 7575-201, 
fax 39 (02) 7611-0556, e-mail gdantoni@ 
imisiam.bitnet. 


ETC 91,1991 European Test Conf., 
'5*7 Apr. 17-19, Munich, West Germany. 
Sponsor: VDE (Zentralstelle Tagungen und 
Seminare). Contact Peter Stilke, VDE, Strese- 
mannallee 15, D-6000 Frankfurt 70, West Ger¬ 
many, phone (69) 6308-203, fax (69) 6308- 
273. 


Second European Distributed Memory 
Computing Conf., Apr. 22-24, Munich, West 
Germany. Cosponsors: Gesellschaft fur Infor- 
matik et al. Contact Arndt Bode, Computer Sci¬ 
ence, Technische Univ. Munich, POB 20-24- 
20, D-8000 Munich 2, Federal Republic of Ger¬ 
many, phone 49 (89) 2105-8240, e-mail 
bode@infovax.informatik.tu- 
muenchen.dbp.de. 


CHI 91,1991 Conf. on Human Factors 
'5*7 in Computing Systems, Apr. 27-May 2, 
New Orleans. Sponsor: ACM. Contact Keith 
Butler, Boeing, Ad Technology Center, PO 
Box 24346, M/S 7L-64, Seattle, WA 98124, 
phone (206) 865-3389; or June Davis, 13 An¬ 
napolis St., Annapolis, MD 21401, phone 
(301) 269-6801. 


May 1991 


ICSE 13,13th Int’l Conf. on Software 
'5*7 Engineering, May 13-16, Austin, 

Texas. Cosponsor: ACM. Contact ICSE 13, 
Bryan Fugate, MCC, 3500 W. Balcones Center 
Dr., Austin, TX 78759-6509, phone (512) 338- 
3330; MCC, PO Box 200015, Austin, TX 
78720-0015; or ICSE 13, IEEE Computer So¬ 
ciety, 1730 Massachusetts Ave. NW, Wash¬ 
ington, DC 20036-1903, phone (202) 371- 
1013. 


CompEuro 91, IEEE Int’l Conf. on 
*517 Advanced Computer Technology, Re¬ 
liable Systems, and Applications, May 13- 
17, Bologna, Italy. Cosponsors: IEEE Region 8 
et al. Contact Vito Monaco, Dip. Eletronica In- 
formatica E Sistemistica, Univ. Di Bologna, 
Viale Risorgimento, 1-60136, Bologna, Italy. 

CCW 91, Third IEEE Conf. on Com- 
'5*7 puter Workstations, May 15-17, Cape 
Cod, Mass. Sponsor: IEEE Technical Commit¬ 
tee on Operating Systems. Contact Luis-Felipe 
Cabrera, IBM Almaden Research Center, MC 
K55/801, 650 Harry Rd„ San Jose, CA 95120- 
6099, phone (408) 927-1838, e-mail 
cabrera@ibm.com; or Kenneth Kane, Boston 
Development Center, Sun Microsystems, 2 
Federal St., Billerica, MA 01802, phone (508) 
671-0367, e-mail kkane@east.sun.com. 

,£3^1 SESAW, Software Engineering Stan- 
'5*7 dard Application Workshop, May 20- 

24, San Diego, Calif. Contact Vera Edelstein, 
Nynex, 500 Westchester Ave., White Plains, 
NY 10604, phone (914) 683-2888. 

Second Int’l Conf. on Algebraic Methodol¬ 
ogy and Software Technology, May 22-24, 

Iowa City, Iowa. Contact Teodor Rus, Com¬ 
puter Science Dept., Univ. Of Iowa, Iowa City, 
IA 52242, phone (319) 335-0694, e-mail rus@ 
herky.cs.uiowa.edu. 

Melecon 91, Fifth Mediterranean Electro¬ 
technical Conf., May 22-24, Ljubljana, Yugo¬ 
slavia. Cosponsors: IEEE Region 8 Yugosla¬ 
via Section et al. Contact Melecon 91 Secretar¬ 
iat, Fakulteta za elektrotehniko, Trzaska 25, 
61001 Ljubljana, Yugoslavia, phone 38 (61) 
265-161, fax 38 (61) 264-990. 

ISCA 18,18th Int’l Symp. on Com- 
'5*7 puter Architecture, May 26-30, 

Toronto, Canada. Cosponsor: ACM. Contact 
K.C. Smith, Univ. of Toronto, Electrical Engi¬ 
neering Dept., Toronto, Ont. M5S 1A4, Can¬ 
ada, phone (416) 978-5033. 


June 1991 


Fourth Int’l Conf. on Industrial and 
*5*7 Engineering Applications of Artificial 
Intelligence and Expert Systems, June 2-5, 

Kauai, Hawaii. Sponsors: ACM et al. Contact 
Moonis Ali, Univ. of Tennessee Space Inst., 
MS 15, B.H. Goethert Pkwy., Tullahoma, TN 
37388-8897, phone (615) 455-0631, ext. 236, 
fax (615) 454-2354, e-mail alif@utsivl.bitnet. 


fax 31 (40) 744-758, e-mail stoots@dooma.prl. 
philips.nl. 

ISCAS 91, 24th IEEE Int’l Symp. on Cir¬ 
cuits and Systems, June 11-14, Singapore. 
Sponsor: IEEE Circuits and Systems Society. 
Contact ISCAS 91 Secretariat, Communica¬ 
tion Int’l Associates, 44/46 Tanjong Pagar Rd., 
Singapore 0208, phone (65) 226-2823, fax (65) 
226-2877. 


SCM 3, Third Int’l Software Configu- 
^*7 ration Management Workshop, June 
12-14, Trondheim, Norway. Cosponsors: 
ACM, et al. Contact Reidar Conradi, Computer 
Systems and Telematics Div., Norwegian Inst, 
of Technology, N-7034 Trondheim, Norway, 
phone 47 (7) 593-444; or Peter Feiler, Software 
Engineering Inst., Carnegie Mellon Univ., 
Pittsburgh, PA 15213-3890, phone (412) 268- 
7790, e-mail phf@sei.cmu.edu. 

DAC 91, 28th ACM/IEEE Design 
'5*7 Automation Conf., June 16-21, 

Orlando, Fla. Cosponsor: ACM. Contact Pat 
Pistilli, MP Associates, 7490 Clubhouse Rd., 
Suite 102, Boulder, CO 80301, phone (303) 
530-4333. 


CG Int’l 91, June 22-28, Cambridge, Mass. 
Cosponsors: Computer Graphics Society, 
MIT. Contact Barbara Dullea, Ocean Engi¬ 
neering Dept., MIT Rm. 5-435, 77 Massachu¬ 
setts Ave., Cambridge, MA 02139, fax (617) 
253-8125, e-mail barbara@deslab.mit.edu. 


1991 IEEE Int’l Symp. on Information The¬ 
ory, June 23-29, Budapest, Hungary. Contact 
Anthony Ephremides, Electrical Engineering 
Dept., Univ. of Maryland, College Park, MD 
20742, phone (301) 454-6871, e-mail 
tony@eng.umd.edu. 


First Int’l Conf. on Artificial Intelligence in 
Design, June 25-27, Edinburgh, Scotland. 
Contact Helen Hodge or Tom Whiting, Butter- 
worth Scientific, Westbury House, Bury 
Street, Guildford, Surrey, GU2 5BH, UK, 
phone (0483) 300-966, fax (0483) 301-563. 


10th Symp. on Computer Arithmetic, 
'5*7 June 26-28, Grenoble, France. Cospon¬ 
sors: ACM et al. Contact Jean-Michel Muller, 
Lab. LIP-IMAC, Ens. Lyon, 69364 Lyon 
Cedex 07, France, phone 33 (72) 72-8229. 


July 1991 


Symp. on Solid Modeling Foundations and 
CAD/CAM Applications, June 5-7, Austin, 
Texas. Sponsor: ACM SIGGraph. Contact 
Joshua turner, CII 7015, RDRC, Rensselaer 
Polytechnic Inst., Troy, NY 12180-3590, 
phone (518) 276-6751, fax (518) 276-2702, e- 
mail jtumer@rdrc.rpi.edu. 

PARLE 91, Conf. on Parallel Architectures 
and Languages Europe, June 10-13, Eind¬ 
hoven, The Netherlands. Cosponsors: Com¬ 
mission of European Communities et al. Con¬ 
tact F. Stoots, Philips Research Labs, PO Box 
80.000, 5600 JA Eindhoven, The Netherlands, 


IEE Bicentennial Conf. on Computing, July 
1-3, London. Contact Conf. Services, Institu¬ 
tion of Electrical Engineers, Savoy Place, Lon¬ 
don WC2R 0BL, UK, phone 44 (71) 240-1871, 
fax 44 (71) 240-7735. 

Second Int’l Conf. on Industrial and Ap¬ 
plied Mathematics, July 8-12, Washington, 
DC. Sponsor: Society for Industrial and Ap¬ 
plied Mathematics. Contact SIAM, 3600 Uni¬ 
versity City Science Center, Philadelphia, PA 
19104-2688, phone (215) 382-9800, fax (215) 
386-7999, e-mail siam@wharton.upenn.edu. 
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IEEE COMPUTER SOCIETY 
Membership / Subscription Application 



BENEFITS 



Computer 

Computer comes automatically 
with membership. Written, 
reviewed, and refereed by 
experts, it features survey and 
tutorial articles covering the 
entire computer field, and 
departments such as new 
products, new product reviews, 
standards, and a reader forum 
called “The Open Channel." 
(monthly). 


Technical Committees 

Participate in one or more of our 33 technical 
committees — networks of professionals with common 
interests in specialty areas within computer hardware, 
software, and applications. 

Standards Working Groups 
Participate in the development of the more than 100 
standards projects currently sponsored by the society 
in such diverse areas as software engineering, local 
area networks, microprocessor buses, design automa¬ 
tion, programming languages, and standards 
definitions. 

Computer Society Press Books 

Receive discounts of up to 50% on over 600 titles 
covering a broad spectrum of computer science topics 
such as networking, communications, advanced 
systems, image processing, security, artificial 
intelligence, and design automation. Over 60 new titles 
are published annually. 

Conferences and Tutorials 
Choose from more than 100 conferences annually, 
ranging from large industry-oriented conferences 
replete with exhibits to small, highly interactive 
workshops. Members receive special low rates. 


Schedule of Fees 


To join: see item 1, 2, or 3. 
To subscribe: see item 4. 

Membership dues and periodical subscriptions are annualized to, and expire on, 
December 31. Choose full- or half-year rate schedules depending on date of 
receipt by the Computer Society as indicated below. Half Year 


Mar 1-Aug 31 Sept 1-Feb 28 


I don’t belong to the IEEE and I want 
to join just the Computer Society 


□ $23.50 □ $47.00 


) I don’t belong to the IEEE and I want 
■ to join both the Computer Society and the IEEE* 

I reside in Region 1 -6 (United States). □ $47.50 □ $95.00 

I reside in Region 7 (Canada). □ $43.50 □ $87.00 

I reside in Region 8 (Europe, Africa, orthe Middle East) □ $43.00 □$86.00 

I reside in Region 9 (Latin America). □ $39.50 □ $79.00 

I reside in Region 10 (Asia and Pacific). □ $38.50 □ $77.00 

le Computer Society may deduct $5 off the 


( I already belong to the IEEE and I want 
to join the Computer Society. 

IEEE Member Number 


□ $9.00 □ $18.00 


OPTIONAL PERIODICALS for new or current members 

issues per year 

IEEE Computer Graphics and Applications (3061) 6 □ $10.00 

□ $20.00 

IEEE Design and Test (3111) . 

.6 

□ $10.50 

□ $21.00 

IEEE Expert (3151) . 

.6 

□ $ 9.00 

□ $18.00 

IEEE Micro (3071) . 

.6 

□ $ 9.50 

□ $19.00 

IEEE Software (3121) . 

.6 

□ $10.00 

□ $20.00 

Transactions on Computers (1161) . 

.12 

□ $10.00 

□ $20.00 

Transactions on Knowledge and 

Data Engineering (1471) . 

.4 

□ $ 5.00 

□ $10.00 

Transactions on Parallel and 

Distributed Systems (1501) . 

.4 

□ $ 5.50 

□ $.11.00 

Transactions on Pattern Anaysis and 
Machine Intelligence (1351) . 

.12 

□ $10.00 

□ $20.00 


Transactions on Software Engineering (1171) .12 □ $10.00 

Total amount remitted with this application $_ 

□ Checks are accepted in Belgian, British, German, Swiss, Japanese, or 
U.S. currencies. U.S. checks must be drawn on a U.S. bank. 

□ Visa □ Master Card □ American Express □ Eurocard 

I I I I I I 


PRICES EXPIRE 12/31/90 


Charge Card Number 


d, if elected, will be governed by IEEE's and the society’s constitutions, bylaws, and statements of 


MAILING ADDRESS 


EDUCATION (highest level completed) _ 


Return to: IEEE Computer Society, 10662 Los Vaqueros Circle, P.O. Box 3014 Los Alamitos, CA 90720-1264 USA. pcc 

Residents of Europe mail to: IEEE Computer Society, 13, Avenue de I’Aquilon, B-1200, Brussels, BELGIUM. 

Asian / Pacific residents mail to: IEEE Computer Society, Ooshima Building, 2-19-1 Minami-Aoyama, Minato-ku, Tokyo 107 JAPAN. 













































CAREER OPPORTUNITIES 


RATES: $12.00 per line, (ten lines mini¬ 
mum). Average five typeset words per 
line, eight lines per column inch. Add 
$10 for box number. Send copy at least 
one month prior to publication date to: 
Marian B. Tibayan, Classified Adver¬ 
tising, COMPUTER Magazine, 10662 
Los Vaqueros Circle, PO Box 3014, 
Los Alamitos, CA 90720-1264; (714) 
821-8380; fax (714) 821-4010. 

In order to conform to the Age Discrimina¬ 
tion in Employment Act and to discourage 
age discrimination, COMPUTER may re¬ 
ject any advertisement containing any of 
these phrases or similar ones: "...recent 
college grads...,” "...1-4 years maximum 
experience...," "...up to 5 years experi¬ 
ence," or "...10 years maximum 
experience." COMPUTER reserves the 
right to append to any advertisement, with¬ 
out specific notice to the advertiser, 
"Experience ranges are suggested mini¬ 
mum requirements, not maximums." 
COMPUTER assumes that, since advertis¬ 
ers have been notified of this policy in 
advance, they agree that any experience re¬ 
quirements, whether stated as ranges or 
otherwise, will be construed by the reader 


DIRECTOR OF ADMINISTRATION 
& BUSINESS OPERATIONS 

Information Sciences Institute of the 
University of Southern California, located in 
Marina del Rey, CA, performs independent 
computer and information sciences R&D for 
government agencies and has an Informa¬ 
tion Processing Center that supplies com¬ 
puter services via the ARPAnet for users 
spanning the USA and the UK. 

The Director of Administration will be part 
of the overall management team in helping 
to set and review scientific directions as well 
as supervise the management of all Institute 
administrative and fiscal activities to include: 
Financial, Budget, Contracts, Personnel, 
Materials & Supplies, Facilities, Telecom¬ 
munications, Legal Affairs, a Technical 
Library, and a Publications Support Group. 
Interact with all USC administrative organi¬ 
zations, and interpret and administer Univer¬ 
sity and Federal policies & procedures. 

Must have a minimum of 7 years experi¬ 
ence in research administration and all 
phases of administration and supervision in a 
non-profit research environment, strong fis¬ 
cal management experience and skills, and 
familiarity with Government research con¬ 
tract management. Should have advanced 
scientific degree with some business courses. 

Please send resume and salary history to 
Lisa Moses, Information Sciences Institute, 
4676 Admiralty Way, Suite 1001, Marina 
del Rey, CA 90292. Ref #23901. 


TRINITY COLLEGE 
Computer Science Faculty 

Trinity College is establishing a Depart¬ 
ment of Computer Science and is seeking 
candidates at any level for a tenure track 
position starting in August 1990. The success¬ 
ful candidate will join two other computer 
science faculty in the continued develop¬ 
ment of the computer science major which 
was begun in 1985 and is presently offered 
through the Department of Engineering and 
Computer Science. 

Candidates must have a Computer Sci¬ 
ence Ph.D., a strong commitment to excel¬ 
lence in undergraduate teaching, a willing¬ 
ness and ability to participate in the continued 
development of a strong liberal arts com¬ 
puter science major and the potential to pur¬ 
sue an active research program. Applicants 
in all areas of computer science will be 
considered. 

Trinity College is a selective liberal arts 
college with a strong commitment to the sci¬ 
ences. In addition to computer science, the 
College offers majors in engineering, mathe¬ 
matics, chemistry, biochemistry, physics, bi¬ 
ology and psychobiology. The College’s aca¬ 
demic computing facilities include a VAX 
8350, a network of Sun 3/50 and 3/60 
workstations and numerous personal com¬ 
puters. Trinity College is an equal opportuni¬ 
ty/affirmative action employer, and has a 
primary goal of increasing the number of 
women and minority faculty in the sciences. 
Please send application letter, vita and letters 
of reference to Professor Ralph Walde, 
Department of Engineering and Computer 
Science, Trinity College, Hartford, CT 
06106. Consideration of applications will 
begin immediately and the search will remain 
open until the position is filled. 


SOFTWARE ENGINEER 

Seattle firm seeks software engineer to 
design, develop & maintain decision support 
system software for healthcare industry utiliz¬ 
ing optimization theory, AI, expert systems, 
image processing & mathematical mor¬ 
phology. Salary $34,000, 40 hours/wk, 
8:30-5 p.m. 

Requires M.S. in computer science or 
electrical engineering, and 1 yr exp in soft¬ 
ware engineering utilizing database and net¬ 
work systems design. Also requires graduate- 
level courses (minimum 3 qtr credit each) in 
network theory, operating systems, com¬ 
puter architecture, software engineering, im¬ 
age processing, computer graphics, expert 
systems, computer vision & optimization 
algorithms. Must also have developed one 
Al-based system. 

Send resume within 30 days of publication 
to Employment Security Dept., Employ¬ 
ment Service Div., Attn: JOB #198990-L, 
Olympia, WA 98504. 


EMPLOYMENT OPPORTUNITY 

Company needs an applicant with M.S. 
Degree in Electrical Engineering and must 
have college courses in advanced micro¬ 
computer system design, advanced com¬ 
puter architectural and networks, and com¬ 
puter graphics, and also know how to use 
computer languages of C, Assembly and 
FORTRAN. To design, develop, imple¬ 
ment, testing new computer hardware sys¬ 
tems, networks systems and systems integra¬ 
tion, and to design, develop, implement 
custom software graphics packages and to 
consult on various computer hardware and 
software problems and provide technical 
support to the customers. 40 hrs./week, 
$33,150/year. Please send resume to: 
Lawrence Employment and Training Office, 
833 Ohio, Lawrence, Kansas 66044. Tele¬ 
phone Number (913) 843-0531. RE: Job# 
KS1900738. 


UNIVERSITY OF HONG KONG 

Director of the Computer Centre 

Applications are invited for the Director¬ 
ship of the University Computer Centre, ten¬ 
able from September 1, 1990, following the 
resignation of Dr. J.T. Yuli. 

The Centre provides a full range of com¬ 
puting services for teaching, research and 
administration. The Director is responsible 
for the formulation and implementation of 
policies and plans for the development of 
computing and information technology ser¬ 
vices in the University, the maintenance and 
coordination of established computing ac¬ 
tivities, and the management of the daily af¬ 
fairs of the Centre. 

Applicants should be graduates with sig¬ 
nificant experience in a diverse computing 
environment, preferably in a tertiary educa¬ 
tion institution. Successful candidates should 
show evidence of strong managerial abilities 
and knowledge of current computing tech¬ 
nology, and be sensitive to the aims and pur¬ 
poses of academic computer applications. 

Annual salary (superannuable) will be 
within the professorial range, of which the 
minimum is HK$518,520 and the average is 
HK$641,400 (approx. US$1 = HK$7.80 as 
at March 27, 1990). At current rates, salaries 
tax will not exceed 15% of gross income. 
Housing at a charge of 7.5% of salary, chil¬ 
dren’s education allowances, leave, and 
medical benefits are provided. 

Further particulars and application forms 
may be obtained from Appointments 
(37707), Association of Commonwealth 
Universities, 36 Gordon Square, London 
WC1H OFF, UK, dr from the Appointments 
Unit, Registry, University of Hong Kong, 
Hong Kong (Fax (852) 8582549); E-mail: 
APPTUNIT@HKUVM.BITNET) Closes 29 
June 1990. 
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SENIOR SYSTEMS ANALYST 


Senior Systems Analyst for design and im¬ 
plementation of intelligent GIS and CAD 
Systems, relational data base magmt system 
design, interpreter and compiler construc¬ 
tion, optimization, algorithm and technique 
design, 3D Graphics processing. Salary for 
40 hr wk week Mon-Fri 9am to 5pm is 
$30,000 yearly. Applicants with M.S. in 
Computer Science, 1 yr exp. and knowl¬ 
edge of C Fortran, Assembly, SQL, and 
ability to work with Unix based hardware and 
network environment send resumes only to: 
Job Service of Florida 701 S.W. 27 Ave, 
Room 15, Miami, Florida 33135 Ref: Job 
Order # FL 0262037. 


EMPLOYMENT OPPORTUNITY 

Company needs an applicant who must 
have M.S. in Computer Science with knowl¬ 
edge to use Distributed Processing System 
(Network) and Artificial Intelligence and also 
knowledge to use Computer languages 
UNIX/C, PROLOG. Will design, develop 
and implement various Company’s com¬ 
puter systems for inventory, accounting, 
sales system and other business applications 
and to apply the principles of Database 
Design, Data Communication, Computer 
Networking with Company’s headquarters 
in Taiwan, Heuristic problem solving and to 
be in charge of system analysis and system 
maintenance. 40 hrs./week, $25,000/year. 
Please submit your resume to: Division of 
Employment Security, 421 East Dunklin 
Street, Jefferson City, Missouri 65101. 
ATTN: John F. Scott. RE: J.O. #364879. 


NAVAL POSTGRADUATE SCHOOL 
Department of Electrical and 
Computer Engineering 

Applications are invited for tenure-track 
faculty positions at all levels to begin Fall 
1990. At the assistant professor level, appli¬ 
cants must demonstrate superior research 
and teaching potential. At higher levels, the 
applicant must have an outstanding record 
of research and teaching achievement. 
Areas of interest include (but are not limited 
to) computer architecture, fault-tolerant 
computing, VLSI, distributed and parallel 
computing, and communications. Candi¬ 
dates should hold a Ph.D. degree as of the 
Fall 1990. The school conducts a traditional 
graduate program at the M.S. and Ph.D. 
level for military and civilians. Students in¬ 
clude officers from the Navy, other U.S. ser¬ 
vices, and allied countries. The faculty is 
predominantly civilian. The department has 
well-equipped laboratories and nationally 
recognized research programs. The appli¬ 
cant should submit a complete resume, a 
statement of teaching interests, visa status, 
and the names and addresses of three refer¬ 
ences to Professor John Powers, Chairman, 
Department of Electrical and Computer 
Engineering, Code 62, Naval Postgraduate 
School, Monterey, CA 93943-5004, 408- 
646-2081. An Equal Opportunity/Affirma¬ 
tive Action Employer. 


DEPUTY DIRECTOR 
International Computer Science 
Institute 

Nominations and Applications are solicited 
for the position of Deputy Director of the In¬ 
ternational Computer Science Institute. The 
Institute is an independent basic research 
laboratory affiliated with and physically near 
the University of California at Berkeley. 
Support comes from U.S. sources and spon¬ 
sor nations, currently Germany, Italy and 
Switzerland. 

The Deputy Director will have the primary 
responsibility for the internal administration 
of the Institute and its post-doctoral and ex¬ 
change programs with sponsor nations. 
There are also many opportunities for new 
initiatives. The position is like the chair of a 
research oriented computer science depart¬ 
ment and the ideal candidate would have 
such experience. ICSI is also expanding its 
research staff and welcomes applications 
from outstanding scientists at any post¬ 
doctoral level. 

Please respond to: 

Domenico Ferrari, 

Deputy Director 

International Computer Science Institute 

1947 Center Street, Suite 600 
Berkeley, CA 94704-1105 


NAVAL POSTGRADUATE SCHOOL 

The Computer Science Department invites 
applications for faculty positions at all levels. 
Our primary interests are in the areas of 
operating systems and programming lan¬ 
guages. Our secondary interests are in the 
areas of visual data processing, graphics, 
and computer architecture (especially real¬ 
time and parallel-processing aspects of the 
three). Applicants should have a Ph.D. in 
Computer Science or a closely related field 
and be committed to high-quality teaching 
and research. Senior applicants must have 
distinguished research records. Appoint¬ 
ments can begin at any time. 

The Department offers M.S. and Ph.D. 
degrees in computer science, but no under¬ 
graduate degrees. Currently, 110 students 
are enrolled in the M.S. program and five in 
the Ph.D. program. Students are military of¬ 
ficers or civilian employees of the Depart¬ 
ment of Defense and are fully supported by 
their sponsoring organization during their 
studies. Departmental facilities (supported 
by eight full-time computer professionals) in¬ 
clude six instructional and research labora¬ 
tories with extensive state-of-the-art equip¬ 
ment. Faculty normally teach for two quarters 
and perform research for two quarters per 
year. The Monterey-Carmel area provides a 
pleasant coastal climate and easy access to 
Silicon Valley companies. 

Send a detailed resume, an abstract of 
your best recent research, and letters of ref¬ 
erence to: 

Faculty Search Committee 
Computer Science Department, Code 52 
Naval Postgraduate School 
Monterey, CA 93943 
Telephone (408) 646-2449 
An Equal Opportunity/Affirmative Action 
Employer 


UNIVERSITY OF CONNECTICUT 
Department Head 

Computer Science & Engineering 

We are currently accepting applications 
and nominations for an anticipated position 
as Professor/Head of the Computer Science 
and Engineering (CSE) Department. Neces¬ 
sary qualifications include an earned doc¬ 
torate, distinguished achievement in educa¬ 
tion and research in computer science and/ 
or engineering and demonstrated leadership 
ability. Future leadership role is expected in 
increasing and strengthening the depart¬ 
mental research activities while maintaining 
an accredited undergraduate engineering 
program. Salary will be competitive and 
commensurate with qualifications. The CSE 
Department, together with Electrical and 
Systems Engineering, Metallurgy, Mechani¬ 
cal, Chemical and Civil Engineering, com¬ 
prise the School of Engineering. CSE has 18 
faculty positions serving approximately 280 
full-time undergraduates and 80 full-time 
graduate students while maintaining a broad 
range of research activity. Available facilities 
include SUN’s, parallel computers, IBM 
3090s connected via either-net. The Depart¬ 
ment has an Industrial Advisory Board and 
wishes to enhance its cooperative research 
and education activities with industry. Ap¬ 
plication letter, resume and names of four 
references should be sent to: Dr. Howard A. 
Sholl, Department Head Search Chairman, 
Computer Science and Engineering, Univer¬ 
sity of Connecticut, U-155, 260 Glenbrook 
Road, Storrs, CT 06269-5957; fax (203) 
486-1273; e-mail has c@bre.uconn.edu). 
The position is available as of September 1, 
1990. Review of applications will begin on 
June 15, 1990. The University of Connec¬ 
ticut is an Affirmative Action/Equal Oppor¬ 
tunity Employer. (Search #0A220). 


EMPLOYMENT OPPORTUNITY 

Our Company needs an applicant with 
M.S. Degree in Electrical Engineering and 
who has taken college courses in computer 
graphics, machine vision and digital image 
processing and also knows how to use 
BASIC, PASCAL and C computer lan¬ 
guages. Will design, develop, implement, 
support and operate the Company’s micro¬ 
computer system in both hardware and soft¬ 
ware, and to specifically design and develop 
imaging/optical disks for data compression. 
40 hrs./week, $33,400/year. Please submit 
your resume to Manhatten Employment and 
Training Office, 621 Humboldt, Manhattan, 
Kansas 66502-0940, Tel: 913-776-8884. 
RE: Job #KS2201334. 


INDUSTRIAL POSITIONS FOR 
HARDWARE & SOFTWARE ENGRS 

Positions nationally. Fees paid by em¬ 
ployers. U.S. citizens or permanent residents. 
Degree plus minimum of two years applicable 
industrial experience. Call, write or fax, 
RSVP SERVICES, Dept. CM, Suite 614, 1 
Cherry Hill Mall, Cherry Hill, NJ 08002. 
800-222-0153. FAX: 609-667-2606. 
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BOOK REVIEWS 


Editor: Guy Johnson, Department of Information Technology, Rochester Institute of Technology, 1 Lomb Memorial Drive, Rochester, NY 14623. 


Quantitative Analysis of Computer Systems 

Clement H.C. Leung (John Wiley & Sons, Chichester, England, 1988,170 pp., $34.95) 


This book’s slim profile is misleading, 
since it is rich in useful information. Both 
professionals entering systems analysis 
and students seeking computer science 
applications of quantitative analysis will 
gain a better understanding of computer 
systems measurement after studying this 
concise text. 

The book is based on course materials 
at Reading and London universities. The 
author stresses that his treatment of the 
material is geared to computer scientists, 
not mathematicians or statisticians. He 
assumes the reader is familiar with basic 
operating systems, machine architec¬ 
tures, and data structures and has a work¬ 
ing knowledge of elementary calculus 
and basic probability. The author also 
strongly emphasizes applied knowledge, 
which is useful for anyone charged with 
measuring actual system performance, 
and he includes numerous examples us¬ 
ing mathematical techniques to deter¬ 
mine system parameters. 

The material is suitable for an ad¬ 
vanced undergraduate or graduate cur¬ 
riculum, although some students might 
find themselves reviewing elementary 
texts for additional information. The text 
could also supplement courses in data¬ 
base systems, computer networks, or op¬ 
erating systems, where performance- 
measurement issues are addressed in a 
quantitative manner. The author’s em¬ 
phasis on the immediate application of 
new knowledge makes the book suitable 
for instruction or review outside a class¬ 
room. Each section presenting new mate¬ 
rial has one or two examples underscor¬ 
ing the concepts just introduced. More- 
detailed exercises follow each chapter. 

The author describes quantitative com¬ 
puter performance analysis as determin¬ 
ing a computer system’s efficiency. He 
has organized the book into three sec¬ 
tions. Chapters 1-3 consider the back¬ 
ground and motivation of quantitative 
performance analysis. The author exam¬ 


ines the Poisson process and the organi¬ 
zation of a queueing system in the context 
of computer system events and resource 
contention. Traffic intensity and 
throughput are calculated for various 
queues. 

Chapters 4-10 then use Little’s for¬ 
mula and the Pollaczek-Khintchine for¬ 
mula to address the prediction and model¬ 
ing of system efficiency through analytic 
description. The author points out that the 
latter formula, when generalized, per¬ 
mits the analysis of facilities with dis¬ 
similar service-time distributions, which 
is useful when considering disk storage 
units, database performance, or devices 
that need additional initialization time 
when starting from a cold state. 


The author does not mention it, but this 
book seems to be organized in three parts. 
The introduction summarizes basic con¬ 
cepts of real-time programming and 
gives an overview of tasking, using real¬ 
time requirements for controlling a one¬ 
way bridge as an example. The body of 
the book then presents various essential 
programming constructs for real-time 
applications, such as event flags, sema¬ 
phores, messages, mailboxes, pools, I/O, 
and file systems. This part also includes a 
general chapter on task coordination 
principles developed by the author. Fi¬ 
nally, there are brief (and somewhat su¬ 
perficial) descriptions of multiproces¬ 
sing, debugging, and Ada tasking — 
three fairly independent topics that each 
could be fully covered in a separate book. 
The book would be clearer if the author 


Finally, Chapters 11-14 examine em¬ 
pirical approaches related to perfor¬ 
mance tuning and monitoring, such as the 
use of scripts and software or hardware 
monitors. 

One of the text’s strengths is the num¬ 
ber of examples, which are a great help to 
students trying to visualize the mathe¬ 
matical ideas or professionals interested 
in applying the analysis techniques to 
their own systems. The material is com¬ 
pact and comprehensive. By narrowly 
defining his topic, the author achieves his 
goal of discussing only quantitative com¬ 
puter system analysis. 

Patricia A. Morreale 

Illinois Institute of Technology 


identified these three parts, but this is not 
a very serious drawback. 

The author states that the book’s in¬ 
tended audience is software engineers 
and system programmers who design, 
code, and maintain real-time applica¬ 
tions; supervisors or managers of such 
projects; and computer science majors 
taking a course in real-time program¬ 
ming. He also states that the book is “in¬ 
tended to be a self-contained introduc¬ 
tory text” that will “teach the reader how 
to divide an application into separate 
tasks and then how to make sure that those 
separate tasks work together as a cohe¬ 
sive, coordinated, and unified program.” 
Indeed, both novices and skilled profes¬ 
sionals will find the author’s treatment of 
the major issues comprehensive, well- 
structured, and systematic. His excellent 


An Implementation Guide to Real-Time 
Programming 

David L. Ripps (Prentice Hall, Englewood Cliffs, N.J., 1989, 262 pp., $49) 
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background and professional expertise 
are evident. 

The book does have limitations, how¬ 
ever. First, the word “guide” in the title 
suggests that the book is general enough 
to cover various implementation meth¬ 
ods, techniques, and tools. However, the 
book concentrates on only one operating 
system (MTOS-UX) and one language 
(C). Certainly, the author has done an ex¬ 
cellent job within this limit, but consider 
that someone else could give the same 
title to a completely different book cover¬ 
ing iRMX286 and PL/M or RMS68K and 
Modula-2. 

The author stresses in his introduction 
that there are two general approaches to 
implementing real-time programs: oper¬ 
ating system-based and language-based. 
The author favors the former, although he 
admits that “time may show” the lan¬ 
guage-based approach is better. Based on 
my own professional experience, there is 
a large and growing group of scientific 
and research users who rely heavily on 
real-time applications in their work but 


for whom mastering operating-system 
internal details is not trivial. These users 
are generally familiar with programming 
from the language side, so they will likely 
opt for a language-based approach. 
Moreover, this approach is better for 
portability of both programs and pro¬ 
grammers. 

The author notes that widely accepted 
standards have not yet emerged in this 
field, but he also states that MTOS-UX 
“is a standard among real-time operating 
systems.” In fact, MTOS-UX is neither a 
formal nor a de facto standard because a 
number of other real-time operating sys¬ 
tems are used equally often by industry. 
The author himself states later that no 
such formal standard exists because there 
are conflicting views on what services 
such, a standard should incorporate. How¬ 
ever, he is aware of current work on such 
standards and gives clear references to 
corresponding publications. (It would be 
interesting to compare the chapters on 
MTOS-UX system calls with IEEE Proj¬ 
ect P1003.4 “Real-Time Extension for 


Portable Operating System,” the stan¬ 
dard that seems to dominate all other 
work. A preliminary look reveals that the 
MTOS notions are conceptually not far 
behind the IEEE work.) 

The author also states that the system 
has the “richest and most comprehensive 
set of real-time facilities of any commer¬ 
cially available real-time operating sys¬ 
tems.” Even if this is so, it is arguable 
whether increasing the number of fea¬ 
tures or facilities automatically increases 
the utility of a particular operating sys¬ 
tem (or any other piece of software). 

I am reluctant to suggest this book as a 
textbook, since it confines its coverage to 
MTOS-UX and C. The book will serve 
well those readers who concentrate on 
MTOS-UX and C, but it is not general 
enough to be a complete guide for imple¬ 
menting real-time programs. That book 
has yet to be written. 


Janusz Zalewski 

Southwest Texas State University 


Systems Architecture & Systems Design 

Dimitris N. Chorafas (McGraw-Hill, New York, 1989, 494 pp„ $49.95) 


Chorafas has written an insightful book 
describing the need for interface stan¬ 
dards and planning in future computer and 
network architectures. Systems Architec¬ 
ture & Systems Design lacks quantitative 
analysis, but that’s fine for an overview 
of such complex topics as expert systems, 
communication protocols, multimedia, 
and intelligent buildings and their rela¬ 
tionship to system architecture. 

The book has three sections with about 
five chapters each. The first section illus¬ 
trates why classical approaches are inva¬ 
lid for today’s solutions. Chorafas gives 
examples of horrendous problems facing 
system architects, including combining 
systems with incompatible protocols or 
operating systems, upgrading systems 
with badly dated hardware and software, 
and ominous performance problems. 

Chorafas has a vision of the future in 
which artificial intelligence, especially 
expert systems, plays a vital role in solv¬ 
ing compatibility problems. He demon¬ 
strates the need for any-to-any solutions, 
where industry standards define inter¬ 
face requirements, and he even proposes 
an eighth layer to the ISO/OSI network 
reference model. 

Chorafas devotes an entire chapter to 
artificial intelligence and its possible fu¬ 
ture role in systems architecture and de¬ 
sign. In his view, AI should permeate ev¬ 


ery aspect of new systems and could be 
used to improve user interfaces, connect 
heterogeneous systems, create knowl¬ 
edge banks, etc. 

In the second section, Chorafas defines 
the development process for a communi¬ 
cations infrastructure to support systems 
architecture and systems integration. He 
highlights several network attributes and 
points to the need for a national network¬ 
ing policy. He also focuses on the defi¬ 
ciencies of ISDN and PBX for future net¬ 
works. Chorafas emphasizes the need to 
prototype new systems, explaining how 
expert systems can aid intelligent plan¬ 
ning and offering examples of current 
expert systems. 

The final section offers methods and 
solutions for systems integration. Chora¬ 
fas demonstrates why systems integra¬ 
tion is important, especially given the 
various incompatible systems already 
available. He includes such topics as 
multimedia and computer-aided publish¬ 
ing to underscore the need for software 
compatibility and high-performance sys¬ 
tems. He also explains the current attri¬ 
butes and nonexistent systems architec¬ 
ture for desktop publishing and docu¬ 
ment handling, and he presents a hypo¬ 
thetical model for electronic publishing. 

Chorafas also describes current smart 
buildings and their future role, claiming 


they are economically justified through 
gains in performance and productivity. 
Finally, he proposes that organizations 
enlist a chief technology officer respon¬ 
sible for strategic planning of corporate 
systems architecture and integration. 

This CTO would serve as an innovator 
and consultant to the board of directors. 

The book also includes a list of recom¬ 
mended technical and professional 
magazines, a list of abbreviations, and a 
1,000-term glossary. 

Anyone who has dealt with poorly de¬ 
signed systems will appreciate this book. 

I applaud Chorafas’ attempt to capture his 
wishful thinking and present it in a logi¬ 
cal formal. The arguments are presented 
well and he supports many of his conclu¬ 
sions with real-world experiences. 

Although the book is insightful and in¬ 
teresting in many ways, readers should 
note that it is about the future, not the pres¬ 
ent. I support Chorafas’ desire to reach a 
higher level of compatibility and integra¬ 
tion throughout the computing and net¬ 
working industry. However, as most en¬ 
gineers know, logic is only one factor — 
along with economics and compatibility 
with existing systems — that will deter¬ 
mine if Chorafas’ vision becomes reality. 

Jeffrey S. Vetter 

Hewlett-Packard 
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Graphics Design and Animation on the IBM Microcomputers 

Julio Sanchez (Prentice Hall, Englewood Cliffs, N.J., 1990, 398 pp., $37) 


This book delivers a comprehensive 
yet concise tutorial on the graphical ker¬ 
nel system (GKS) model of computer 
graphics with a minimum of mathemat¬ 
ics. The math is explained well, and the 
book starts off with a good exposition of 
the aesthetics and perception of images. 
However, the book’s organization is a 
little strange, with chapters on device¬ 
level programming alternating with chap¬ 
ters on higher-level graphics functions. 

Much of the text is devoted to technical 
details of the 8086/8087 chips and the 
IBM BIOS. Most of the discussion would 
be lost on someone unfamiliar with as¬ 
sembly-language programming on the 
PC, and I’m not sure that a book of this na¬ 
ture is the place for such a tutorial. There 
are, in fact, several books that go into 
great detail about writing DOS and OS/2 
device drivers. And while the author does 
a good job of explaining the features and 


modes of the various graphics boards and 
the 8087 floating-point chip, he assumes 
some familiarity with how operating sys¬ 
tems are put together, how interrupts 
work, common pitfalls, etc. In other 
words, although it isn’t really necessary, 
it wouldn’t hurt to read an introductory 
book on operating systems if you really 
want to appreciate the sections on device 
specifics. 

The book would have been much better 
had the author devoted a chapter or two to 
the bit-level and device-specific details. 
Also, I disagree with the author’s use of 
assembly language to maximize perform¬ 
ance and compactness in all the routines. 
A good C compiler can come very close to 
what assembly language can accomplish 
in terms of performance and compactness 
of code, and a C program can be easier to 
understand and maintain than the same 
program written in assembly. 


Given the book’s depth otherwise, I was 
surprised there wasn’t more explanation 
of colors and color maps and how to use 
them. The title does limit the book’s scope 
to IBM microcomputers, but a discussion 
relating the PC to other types of graphics 
hardware would have been helpful. 

Overall, Sanchez has produced a good 
introduction to graphics programming. 
However, I doubt this would make a good 
course text by itself. There are many good 
diagrams, but not enough examples and 
no exercises. The text would be most ef¬ 
fective with access to an IBM PC or PS/2 
and the electronic form of the programs 
and drivers (which can be ordered for a 
nominal charge), but anyone interested in 
getting started in graphics programming 
would benefit from the book. 

Michael Ha 

Digital Equipment Corporation 


Quality Engineering Using Robust Design 

Madhav S. Phadke (Prentice Hall, Englewood Cliffs, N.J., 1989, 334 pp., $37) 


The only difference between a new 
graduate and a knowledgeable engineer 
who designs efficient equipment is expe¬ 
rience. This is a vague, qualitative dis¬ 
tinction, but one that everyone can under¬ 
stand. In Quality Engineering Using Ro¬ 
bust Design, Phadke takes the surprising 
step of introducing a quantitative defini¬ 
tion of quality, elaborating with a way to 
maximize quality that applies to every¬ 
thing from brick-making to computer 
system administration. In the process, 
Phadke removes some of the mystique of 
“experience.” 

Phadke defines quality and the factors 
that affect it with concepts familiar to any 
engineer: average loss, noise factors, and 
statistical analysis. He then gives a 
method to determine the best values for 
control variables that is essentially just 
an application of another familiar con¬ 
cept: superposition. Phadke finishes the 
book with descriptions of computer- 
aided design, dynamic-system optimiza¬ 
tion, and an application of his design 
method to increase product reliability. 

The many entertaining and informa¬ 
tive examples make this book really 
shine. I couldn’t help being intrigued by a 
design methodology that applies equally 
well to silicon deposition, brick-making, 
and electronic filters. But as much as I en¬ 
joyed reading about the applications of 


Phadke’s method (taken from the experi¬ 
ences of Phadke, his associates, and his 
students), I was bothered that its presen¬ 
tation is not complete. The methods work 
fine as long as all of a system’s control 
variables are independent, but Phadke 
spends little time discussing systems 
with interacting components. Also, 
Phadke tends to preach somewhat, which 
made me feel I was learning about Tao 
rather than engineering quality control. 

This book begins to address the long¬ 
standing lack of quality-control proce¬ 
dures applicable to computers. I defi¬ 


nitely recommend it to practicing engi¬ 
neers who want to optimize anything. But 
new graduates — armed with what they 
think is complete engineering knowledge 
but soon faced with real design problems 
that may not make sense to them — would 
benefit most from the book. Overall, 
Quality Engineering Using Robust De¬ 
sign is entertaining, informative, and 
probably could be applied to any engi¬ 
neering problem. 

James Garnett 

Boulder, Colorado 


Reviewers wanted 

If you are interested in reviewing books for Computer, please submit your 
name, address, and a list of areas of interest and expertise to the Book Reviews 
Editor at the address below. Publishers should submit recent books for review 
consideration to the same address. 

Guy Johnson 

Department of Information Technology 
Rochester Institute of Technology 
1 Lomb Memorial Drive 
Rochester, NY 14623 
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NEW LITERATURE 


Superconductors. Author Richard K. 
Miller reviews specific electronic and in¬ 
strumentation applications based on Jo- 
sephson junctions, thin film, and other 
technologies in Superconductors: Elec¬ 
tronics and Computer Applications 
(ISBN 0-88173-103-X, 270 pp„ $95). 
Miller also discusses possible effects on 
the electronics industry and new product 
opportunities. Contact Fairmont Press, 
700 Indian Trail, Lilbum, GA 30247, 
phone (404) 925-9558. 

MIT AI. With topics ranging from 
demonstrated advances to theoretical 
proposals, Artificial Intelligence at MIT: 
Expanding Frontiers offers more than 40 
contributions in such areas as robotics, 
vision, natural language, learning and 
common-sense problem solving, and 
model-based reasoning systems. Patrick 
H. Winston, director of the MIT Artificial 
Intelligence Laboratory, and Sarah A. 


Shellars edited the two-volume set (ISBN 
0-262-23154-9), which costs $60 and 
comprises 1,200 pages. Each volume 
purchased separately is $35 (Vol 1: ISBN 
0-262-23150-6; Vol. 2: ISBN 0-262- 
23151-4). Contact The MIT Press, 55 
Hayward St., Cambridge, MA 02142, 
phone (617) 253-2884. 

Optical computing. Optical Com¬ 
puter Architectures: The Application of 
Optical Concepts to Next Generation 
Computers (ISBN 0-471-63242-2, 400 
pp., $49.95) by Alistair D. McAulay de¬ 
scribes the basic concepts of optic systems 
and explores optical comptuing devices, 
associative memories, interconnections, 
and optical logic. Contact John Wiley & 
Sons, 605 Third Ave., New York, NY 
10158-0012. 

Software journal. Software Engi¬ 
neering: Tools, Techniques, Practice is a 


new bimonthly journal focusing on tools 
and techniques to optimize software de¬ 
velopment. It recommends new service 
capabilities, management techniques, 
and ways to save on system maintenance. 
The subscription price for the 48-page 
journal is $145. Contact Auerbach Pub¬ 
lishers, One Penn Plaza, New York, NY 
10019, phone (212) 971-5000. 

Performance analysis. In The Art of 

Computer Systems Performance Analy¬ 
sis: Techniques for Experimental De¬ 
sign, Measurement, Simulation, and 
Modeling (ISBN 0-471-50336-3, 384 
pp., $39.95), author Raj Jain gives per¬ 
formance benchmarking techniques and 
includes case studies from Digital Equip¬ 
ment Corporation and information on 
such related topics as experimental sys¬ 
tems design, simulation, and data analy¬ 
sis. Contact John Wiley & Sons, 605 
Third Ave., New York, NY 10158-0012. 


contents CS MAGAZINES 


May-IE EE Software 

Tools Fair 

Software Tools in Context, Dennis B. Smith 
and Paul Oman 

Performance Tools, Kathleen Nichols 

User-Interface Development Tools, Ed Lee 

CASE Analysis and Design Tools, Paul Oman 

Tools for Multiple-CPU Environments, 
Warren Harrison 

Testing Tools, Mike Lutz 


Maintenance Tools, Paul Oman 

Code Generators, Ted Lewis 

Integrated and Management Tools, Sorel 
Reisman 

Special Features 

A 3D Spreadsheet Based on Intensional 
Logic, Weichang Du and William W. Wadge 

A Hypertext System to Manage Software 
Life-Cycle Documents, Pankaj K. Garg and 
Walt Scacchi 

Special Report: 1989 Gordon Bell Prize, 

Jack Dongarra et al. 


May IEEE Computer 
Graphics and 
Applications 

A Notion for Interactive Behavioral Anima¬ 
tion Control, Jane Wilhelms and Robert 
Skinner 

A Note on the Use of Nonlinear Filtering in 
Computer Graphics, Mark E. Lee and Rich¬ 
ard A. Redner 

A Real-Time Particle System for Display of 
Ship Wakes, Michael E. Goss 

Ray Tracing Mirages, Marc Berger, Terry 
Trout, and Nancy Levit 


Magazine order form 

Single issue prices: 

Nonmember, $20 

Address 

Cit 

Software 

State/ZIP/Countr 

IEEE-CS member no. (required for discount) 

rfiVr* 

Mail to: IEEE Computer Society Order Dept., 10662 Los Vaqueros Circle, 

Total 

PO Box 3014, Los Alamitos, CA, 90720-1264. 

(payment enclosed) 


Computer-Assisted Surgery, Ludwig Adams 
et al. 


Registration of 3D Objects and Surfaces, 

Klaus D. Toennies et al. 

Dealing with the Ill-Conditioned Equations 
of Motion for Articulated Figures, Anthony 
A. Maciejewski 

Computing the Arc Length of Parametric 
Curves, Brian Guenter and Richard Parent 


For subscription information, circle 
her 200 on the reader service card. 
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Weitek Your 
386/486! 

TJig.new 4167 delivers up to 10 Megaflops when driven by 
NDP Fortran-486 and is supported by dozens of scientific, 
engineering and CAD applications. MicroWay provided the 
tools to develop many of these applications and supplies 
the interface cards required to use Weitek coprocessors 
in conjunction with with an 80387, in both standard AT 
bus and MicroChannel machines. 
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Number Smasher 386/25**% 

Our newest AT accelerator board replaces your 
80286 with an 80386 clocked at 20 or 25 MHz. It is 
socketed for 8 Megabytes of 32 bit RAM, an 80387 or 
3167 and a 64K SRAM cache. The NDP Fortran-386 driven 
3167 throughput at 25 MHz is 5.5 Megawhetstones. 

mW 3167/387 

This popular daughterboard (shown on the Number Smasher 
386/25) lets you plug a 3167 and an 80387 into a 386 system 
that has a single EMC socket. 


3167/4167 Numeric 

Performance 

3167/MCA 

NS 386/25 NS/486/25 

Megawhetstones 3.4 

5.5 12.2 

Megawhetscales 1.6 

3.1 9.9 



This XT/AT motherboard 
, replacement features a 25 

MHz 80486,4167 and a BURST 
BUS memory interface. The 
BURST BUS architecture is ideal 
f for engineering, scientific and 
CAD/CAM applications. The NDP 
Fortran-486 driven numeric through¬ 
put running with the 4167 is 12 Meg- 
whetstones and 10 Megawhetscales 
e BYTE 1989 IBM issue). 


mW3167/MCA 


Our MCA Weitek card runs in the IBM Mode 
70 and 80. At 20 MHz, its performance is 2 
to 3 times that of an 80387. 

NDP Fort ran-486 and C-486 
are globally optimized main¬ 
frame compilers that have 
been fine tuned for the 
80486 and 4176. 
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